python - Parsing a long html using BeautifulSoup failed with half parsed output -

January 15, 2013

i used following script parse fund price of particular fund:

import pandas pd bs4 import beautifulsoup ghost import ghost ghost = ghost() page,resources = ghost.open('http://bank.hangseng.com/1/pa_1_1_p1/comsvlet_minisite_eng_gif?app=einvcfunddetailsov&pri_fund_code=u44217') page,resources = ghost.evaluate("agree()", expect_loading=true) page,resources = ghost.evaluate("mm_changeview('einvcfundpricedividend')", expect_loading=true) # ghost.capture_to("hangseng.png") soup = beautifulsoup(page.content) soup

the output soup ok first half, tag turned in uppercase , beautifulsoup cannot parse them, 1 below:

<td class="lightgrey" valign="top"><font class="content">22-07-2014</font></td><td class="lightgrey" valign="top"><font class="content">10.95000</font></td><td class="lightgrey" valign="top"><font class="content">11.39000</font></td><td class="lightgrey" valign="top"><font class="content">10.95000</font></td> </tr>  t r   v l g n = " t o p "   l g n = " c e n t e r " &gt;   t d   c l s s = " l g h t g r e y "   v l g n = " t o p " &gt; f o n t   c l s s = " c o n t e n t " &gt; 2 1 - 0 7 - 2 0 1 4 / f o n t &gt; / t d &gt; t d   c l s s = " l g h t g r e y "   v l g n = " t o p " &gt; f o n t   c l s s = " c o n t e n t " &gt; 1 0 . 9 6 0 0 0 / f o n t &gt; / t d &gt; t d   c l s s = " l g h t g r e y "   v l g n = " t o p " &gt; f o n t   c l s s = " c o n t e n t " &gt; 1 1 . 4 0 0 0 0 / f o n t &gt; / t d &gt; t d   c l s s = " l g h t g r e y "   v l g n = " t o p " &gt; f o n t   c l s s = " c o n t e n t " &gt; 1 0 . 9 6 0 0 0 / f o n t &gt; / t d &gt;   / t r &gt;

you can see output becomes garbage after date 2014-07-22.

what happened?

i found solution spaced output beautifulsoup

page.content soup = beautifulsoup(page.content,'html.parser')

now works perfectly.

Search This Blog

O9

python - Parsing a long html using BeautifulSoup failed with half parsed output -

Comments

Post a Comment

Popular posts from this blog

Error while updating a record in APEX screen -

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

ios - Xcode 5 "No such file or directory" -