Python lxml - returns null list -


i cannot figure out wrong xpath when trying extract value webpage table. method seems correct can extract page title , other attributes, cannot extract third value, returns empty list?

from lxml import html import requests  test_url = 'sc312226'  page = ('https://www.opencompany.co.uk/company/'+test_url)  print 'now searching url: '+page  data = requests.get(page) tree = html.fromstring(data.text)  print tree.xpath('//title/text()') # page title   print tree.xpath('//a/@href') # href attribute of links   print tree.xpath('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') 

unless i'm missing something, appear xpath correct:

chrome screenshot

i checked chrome console, appears ok! i'm @ loss

$x ('//*[@id="financial"]/table/tbody/tr/td[1]/table/tbody/tr[2]/td[1]/div[2]/text()') [ "£432,272" ] 

you should specify element name. if don't want specify specific tag name, can use *:

print tree.xpath('//*[@id="financial"]/...')                     ^ 

update

in html file (just html before rendering in browser), there's no tbody tag. need remove tbody expression:

//*[@id="financial"]/table/tr/td[1]/table/tr[2]/td[1]/div[2]/text() 

alternative way using following-sibling axis:

//div[text()="total assets"]/following-sibling::div/text() 

Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -