python - Regular Expression Processing HTML -
i need replace html tags (e.g. <p>
, <img>
, etc.) in web page source code, want maintain <br>
, <br/>
. have tried:
re.sub(r'<[^>]+?>', u'', html, flags=re.i)
this achieves first goal, cannot maintain <br>
or <br/>
. r'<[^>br]+?>'
wont accomplish goal either.
what right regular expression?
<((?!\bbr\b).)*?>
this should work case.the negative lookahead ensure <br>
not picked.
edit:
<(?:(?!\bbr\/?(?=>)).)*?>
try if have such absurd things. <a href="http://host.domain.tld/br">
see demo.
http://regex101.com/r/su3fa2/57
python html regex
No comments:
Post a Comment