Python lxml - returns null list -
i cannot figure out wrong xpath when trying extract value webpage table. method seems correct can extract page title , other attributes, cannot extract third value, returns empty list?
from lxml import html import requests test_url = 'sc312226' page = ('https://www.opencompany.co.uk/company/'+test_url) print 'now searching url: '+page data = requests.get(page) tree = html.fromstring(data.text) print tree.xpath('//title/text()') # page title print tree.xpath('//a/@href') # href attribute of links print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')
unless i'm missing something, appear xpath correct:
i checked chrome console, appears ok! i'm @ loss
$x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') [ "£432,272" ]
you should specify element name. if don't want specify specific tag name, can use *
:
print tree.xpath('//*[@id="financial"]/...') ^
update
in html file (just html before rendering in browser), there's no tbody tag. need remove tbody
expression:
//*[@id="financial"]/table/tr/td[1]/table/tr[2]/td[1]/div[2]/text()
alternative way using following-sibling
axis:
//div[text()="total assets"]/following-sibling::div/text()
Comments
Post a Comment