Sunday 15 August 2010

extracting data between span tags with BeautifulSoup Python -



extracting data between span tags with BeautifulSoup Python -

i extract info between span tags. here sample of html code:

<p> <span class="html-italic">3-acetyl-</span> <span class="html-italic">(4-acetyl-5-(β</span> "-" <span class="html-italic">naphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one</span> "(" <b>5b</b> </p>

i need total name:

3-acetyl-4-acetyl-5-(β-naphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one (without 5b). don't know how extract '-' between sec , 3rd span tags. also, total number of span tags may vary , '-' can between span tags. code wrote gives me only: 3-acetyl-4-acetyl-5-(β. here part of code:

p = soup.find("p") name = "" kid in p.children: if child.name == "span": name += child.text print name

any help highly appreciated!

you utilize css selectors.

>>> ''.join(i.text in soup.select('p > span')) '3-acetyl-(4-acetyl-5-(βnaphtyl)-4,5-dihydro-1,3,4-oxodiazol-2-yl)methoxy)-2h-chromen-2-one'

python beautifulsoup

No comments:

Post a Comment