Python lxml - returns null list -

September 15, 2010

i cannot figure out wrong xpath when trying extract value webpage table. method seems correct can extract page title , other attributes, cannot extract third value, returns empty list?

from lxml import html import requests  test_url = 'sc312226'  page = ('https://www.opencompany.co.uk/company/'+test_url)  print 'now searching url: '+page  data = requests.get(page) tree = html.fromstring(data.text)  print tree.xpath('//title/text()') # page title   print tree.xpath('//a/@href') # href attribute of links   print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()')

unless i'm missing something, appear xpath correct:

chrome screenshot

i checked chrome console, appears ok! i'm @ loss

$x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') [ "£432,272" ]

you should specify element name. if don't want specify specific tag name, can use *:

print tree.xpath('//*[@id="financial"]/...')                     ^

update

in html file (just html before rendering in browser), there's no tbody tag. need remove tbody expression:

//*[@id="financial"]/table/tr/td[1]/table/tr[2]/td[1]/div[2]/text()

alternative way using following-sibling axis:

//div[text()="total assets"]/following-sibling::div/text()

Search This Blog

O9

Python lxml - returns null list -

Comments

Post a Comment

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

Error while updating a record in APEX screen -

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -