How to use python and beautfulsoup to print timestamp/last updated time (from HTML:) for each row ? -
how use python , beautfulsoup print timestamp/last updated time (from html:) each row ? lot !
a)
1) can add print a)date/time , b)last updated time after row ?
a) date/time - display time when execute python code
b) last updated time html:
html structure:
td x 1 including 2 tables each table have few "tr" , within "tr" have few "td" data inside
html:
<td> <table width="100%" border="4" cellspacing="0" bordercolor="white" align="center"> <tbody> <tr> <td colspan="2" class="verd_black11">last updated: 18/08/2014 10:19</td> </tr> <tr> <td colspan="3" class="verd_black11">all data delayed @ least 15 minutes</td> </tr> </tbody> </table> <table width="100%" border="4" cellspacing="0" bordercolor="white" align="center"> <tbody id="tbody"> <tr id="tr0" class="tablehdrb1" align="center"> <td align="centre">c aug-14 - 15000</td> <td align="right"> - </td> <td align="right">5</td> <td align="right">9,904</td> </tr> </tbody> </table> </td>
code:
import urllib2 bs4 import beautifulsoup contenturl = "html:" soup = beautifulsoup(urllib2.urlopen(contenturl).read()) table = soup.find('tbody', attrs={'id': 'tbody'}) rows = table.findall('tr') tr in rows: cols = tr.findall('td') td in cols: t = td.find(text=true) if t: text = t + ';' print text, print
output above code
c aug-14 - 15000 ; - ; 5 ; 9,904
expected output:
c aug-14 - 15000 ; - ; 5 ; 9,904 ; 18/08/2014 ; 13:48:00 ; 18/08/2014 ; 10:19 (execute python code) (last updated time)
to time code executes:
import datetime datetime code_executes = datetime.now()
then pick pieces need code_executes tuple , format string later insert print statement below
to last updated" bit need read top table don't because doesn't have id attribute specify. if html has 2 tables, can drop id specification beautiful soup , both tables.
to time use str.startswith("last updated:") , grab comes after (string slice work)
in end, tack on 2 times line when print it
Comments
Post a Comment