python - Beautifulsoup Unable to Find Classes with Hyphens in Their Name -


i using beautifulsoup4 on macosx running python 2.7.8. having difficulty extracting information following html code

 <tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650">       <tr id="yui-rec0" class="yui-dt-first yui-dt-even">            <td headers="yui-dt0-th-rank" class="rank yui-dt0-col-rank"></td>            </tr>       <tr id="yui-rec1" class="yui-dt-odd">...</tr>       <tr id="yui-rec2" class="yui-dt-even">...</tr>  </tbody> 

i can't seem grab table or of it's contents because bs and/or python doesn't seem recognize values hyphens. usual code, like

 table = soup.find('tbody',{'class':'yui-dt-data'}) 

or

 row2 = table.find('tr',{'id':'yui-rec2'}) 

just returns empty object (not none, empty). i'm not new bs4 or python , i've extracted information site before, class names different when did it. has hyphens. there way python recognize hyphen or workaround?

i need have code general can run across numerous pages have same class name. unfortunately, id attribute in <tbody> unique particular table, can't use identify table across webpages.

any appreciated. in advance.

the following code:

from bs4 import beautifulsoup  htmlstring = """ <tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650">       <tr id="yui-rec0" class="yui-dt-first yui-dt-even">       <tr id="yui-rec1" class="yui-dt-odd">       <tr id="yui-rec2" class="yui-dt-even">"""   soup = beautifulsoup(htmlstring) table = soup.find('tbody', attrs={'class': 'yui-dt-data'})  print("table:\n") print(table) tr = table.find('tr', attrs={'class': 'yui-dt-odd'})  print("tr:\n") print(tr) 

outputs:

table:  <tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0"> <tr class="yui-dt-first yui-dt-even" id="yui-rec0"> <tr class="yui-dt-odd" id="yui-rec1"> <tr class="yui-dt-even" id="yui-rec2"></tr></tr></tr></tbody> tr:  <tr class="yui-dt-odd" id="yui-rec1"> <tr class="yui-dt-even" id="yui-rec2"></tr></tr> 

even though html supplied isn't valid, seems bs making guess how should be, because soup.prettify() yields

<tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0">  <tr class="yui-dt-first yui-dt-even" id="yui-rec0">   <tr class="yui-dt-odd" id="yui-rec1">    <tr class="yui-dt-even" id="yui-rec2">    </tr>   </tr>  </tr> </tbody> 

though i'm guessing tr's aren't supposed nested.

could try running exact code , seeing output is?


Comments

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -