extracting data between span tags with BeautifulSoup Python -
i extract info between span tags. here sample of html code:
<p> <span class="html-italic">3-acetyl-</span> <span class="html-italic">(4-acetyl-5-(β</span> "-" <span class="html-italic">naphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one</span> "(" <b>5b</b> </p>
i need total name:
3-acetyl-4-acetyl-5-(β-naphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one
(without 5b). don't know how extract '-' between sec , 3rd span tags. also, total number of span tags may vary , '-' can between span tags. code wrote gives me only: 3-acetyl-4-acetyl-5-(β. here part of code:
p = soup.find("p") name = "" kid in p.children: if child.name == "span": name += child.text print name
any help highly appreciated!
you utilize css selectors.
>>> ''.join(i.text in soup.select('p > span')) '3-acetyl-(4-acetyl-5-(βnaphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one'
python beautifulsoup
No comments:
Post a Comment