python - How to use Beautiful soup to return destination from HTML anchor tags -
i using python 2 , beautiful soup parse html retrieved using requests module
import requests bs4 import beautifulsoup site = requests.get("http://www.stackoverflow.com/") html = site.text links = beautifulsoup(html).find_all('a')
which returns list containing output looks <a href="hereorthere.com">navigate</a>
the content of attribute href
each anchor tag can in several forms, illustration javascript phone call on page, relative address page same domain(/next/one/file.php)
, or specific web address (http://www.stackoverflow.com/).
using beautifulsoup possible homecoming web addresses of both relative , specific addresses 1 list, excluding javascript calls , such, leaving navigable links?
from bs docs:
one mutual task extracting urls found within page’s <a> tags: link in soup.find_all('a'): print(link.get('href'))
python beautifulsoup
No comments:
Post a Comment