Jsoup removing elements automatically? -
i've been using jsoup while encountered bug jsoup automatically remove "table" element , can not find workaround...
document doc = jsoup.connect("http://www.planet-series.tv/dr-house/").get(); system.out.println(doc);
if navigate link in piece of code, can see there multiple element "table" (for example: under "saison 01 (vf)", there 22 table elements containing "episode x"), yet absent in jsoup output...
expected
result
i tried document simple httpclient
, print (table elements there), parse jsoup, reprint (table elements gone) know it's not javascript issue or whatever , jsoup indeed causing it.
can tell me missing please?
some websites perform optimization/restriction based on user-agent data (the header browser attach request inform website type of browser). website block content if user agent not set.
you try use simplified mozilla user agent simulate real browser , fetch data:
document doc = jsoup.connect("http://www.planet-series.tv/dr-house/") .useragent("mozilla").get(); system.out.println(doc);
if not work, , hit bug of jsoup, fetch data using httpclient , create document using:
document doc = jsoup.parse(html);
where html
string containing page content.
Comments
Post a Comment