python - How to get page title in requests -
what simplest way title of page in requests?
r = requests.get('http://www.imdb.com/title/tt0108778/') # ? r.title friends (tv series 1994–2004) - imdb
you need html parser parse html response , title
tag's text:
example using lxml.html
:
>>> import requests >>> lxml.html import fromstring >>> r = requests.get('http://www.imdb.com/title/tt0108778/') >>> tree = fromstring(r.content) >>> tree.findtext('.//title') u'friends (tv series 1994\u20132004) - imdb'
there other options, like, example, mechanize
library:
>>> import mechanize >>> br = mechanize.browser() >>> br.get('http://www.imdb.com/title/tt0108778/') >>> br.title() 'friends (tv series 1994\xe2\x80\x932004) - imdb'
what alternative take depends on going next: parse page more data, or, may be, want interact it: click buttons, submit forms, follow links etc.
besides, may want utilize api provided imdb
, instead of going downwards html parsing, see:
example usage of imdbpy
package:
>>> imdb import imdb >>> ia = imdb() >>> film = ia.get_movie('0108778') >>> movie['title'] u'friends' >>> movie['series years'] u'1994-2004'
python html html-parsing
No comments:
Post a Comment