Tuesday 15 April 2014

linux - find links with regex -



linux - find links with regex -

i trying larn linux commands , regular expressions, stuck on little problem have trying find series of links within file using sed , regular expressions, can help me work out , going wrong. links

<a href="../a-lot-of-different/words-that/should-link.html">useful links</a> <a href="..//a-lot-of-different/words-that/should-find-lots-of-links.html">multiple links</a> <a href="../another-word-and-links/multiple-words/sjshfi-dfg.html">more links</a>

this have.

sed -n '/<a*href=”^[../"]*\([a-z]*\)^[.html](["]*\)/p' /file > newfile

regular expressions less ideal parsing html.

you didn't show desired output. guessing want extract links. if so, try:

$ sed -rn 's/.*<a\s+href="([^"]*)".*/\1/p' file ../a-lot-of-different/words-that/should-link.html ..//a-lot-of-different/words-that/should-find-lots-of-links.html ../another-word-and-links/multiple-words/sjshfi-dfg.html

how works:

.*<a\s+href="

this matches before link.

([^"]*)

this matches link , captures grouping \1.

".*

this matches double-quote after line , follows.

linux sed

No comments:

Post a Comment