python - Beautifulsoup Unable to Find Classes with Hyphens in Their Name -
i using beautifulsoup4 on macosx running python 2.7.8. having difficulty extracting information following html code
<tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650"> <tr id="yui-rec0" class="yui-dt-first yui-dt-even"> <td headers="yui-dt0-th-rank" class="rank yui-dt0-col-rank"></td> </tr> <tr id="yui-rec1" class="yui-dt-odd">...</tr> <tr id="yui-rec2" class="yui-dt-even">...</tr> </tbody> i can't seem grab table or of it's contents because bs and/or python doesn't seem recognize values hyphens. usual code, like
table = soup.find('tbody',{'class':'yui-dt-data'}) or
row2 = table.find('tr',{'id':'yui-rec2'}) just returns empty object (not none, empty). i'm not new bs4 or python , i've extracted information site before, class names different when did it. has hyphens. there way python recognize hyphen or workaround?
i need have code general can run across numerous pages have same class name. unfortunately, id attribute in <tbody> unique particular table, can't use identify table across webpages.
any appreciated. in advance.
the following code:
from bs4 import beautifulsoup htmlstring = """ <tbody tabindex="0" class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650"> <tr id="yui-rec0" class="yui-dt-first yui-dt-even"> <tr id="yui-rec1" class="yui-dt-odd"> <tr id="yui-rec2" class="yui-dt-even">""" soup = beautifulsoup(htmlstring) table = soup.find('tbody', attrs={'class': 'yui-dt-data'}) print("table:\n") print(table) tr = table.find('tr', attrs={'class': 'yui-dt-odd'}) print("tr:\n") print(tr) outputs:
table: <tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0"> <tr class="yui-dt-first yui-dt-even" id="yui-rec0"> <tr class="yui-dt-odd" id="yui-rec1"> <tr class="yui-dt-even" id="yui-rec2"></tr></tr></tr></tbody> tr: <tr class="yui-dt-odd" id="yui-rec1"> <tr class="yui-dt-even" id="yui-rec2"></tr></tr> even though html supplied isn't valid, seems bs making guess how should be, because soup.prettify() yields
<tbody class="yui-dt-data" id="yui_3_5_0_1_1408418470185_1650" tabindex="0"> <tr class="yui-dt-first yui-dt-even" id="yui-rec0"> <tr class="yui-dt-odd" id="yui-rec1"> <tr class="yui-dt-even" id="yui-rec2"> </tr> </tr> </tr> </tbody> though i'm guessing tr's aren't supposed nested.
could try running exact code , seeing output is?
Comments
Post a Comment