Tuesday 15 September 2015

python - How to use Beautiful soup to return destination from HTML anchor tags -



python - How to use Beautiful soup to return destination from HTML anchor tags -

i using python 2 , beautiful soup parse html retrieved using requests module

import requests bs4 import beautifulsoup site = requests.get("http://www.stackoverflow.com/") html = site.text links = beautifulsoup(html).find_all('a')

which returns list containing output looks <a href="hereorthere.com">navigate</a>

the content of attribute href each anchor tag can in several forms, illustration javascript phone call on page, relative address page same domain(/next/one/file.php), or specific web address (http://www.stackoverflow.com/).

using beautifulsoup possible homecoming web addresses of both relative , specific addresses 1 list, excluding javascript calls , such, leaving navigable links?

from bs docs:

one mutual task extracting urls found within page’s <a> tags: link in soup.find_all('a'): print(link.get('href'))

python beautifulsoup

No comments:

Post a Comment