data mining - how schema.org can help in nlp -


i working on nlp, collecting interest based data web pages.

i came across source http://schema.org/ being helpful in nlp stuff.

i go through documentation, can see adds additional tag properties identify html tag content.

it may search engine specific data per user query.

it says : schema.org provides collection of shared vocabularies webmasters can use mark pages in ways can understood major search engines: google, microsoft, yandex , yahoo!

but don't understand how can me being nlp guy? parse web page content process , extract data it. schema.org may there, don't know how utilize it.

any example or guidance appreciable.

schema.org uses microdata format representation. people use microdata text analytics , extracting curated contents. there can numerous application.

  1. suppose want create news summarization system. can use hnews microformats extract relevant content , perform summrization onit

  2. suppose if have review based search engine, want list products positive review. can use hreview microfomrat extract reviews, perform sentiment analysis on identify product has -ve or +ve review

  3. if want create skill based resume classifier extract content hresume microformat. can give various details contact (uses hcard microformat), experience, achievements , related work, education , skills/qualifications, affiliations , publications , performance/skills performance etc. can perform classifier on classify cvs particular skillsets

thought schema.org not helps directly nlp guys, provides platform perform text processing in better way.

check out http://en.wikipedia.org/wiki/microformat#specific_microformats see various mircorformat, same page give more details.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

python - Django-cities exits with "killed" -

python - How to get a widget position inside it's layout in Kivy? -