Wednesday 15 June 2011

python - How to get page title in requests -



python - How to get page title in requests -

what simplest way title of page in requests?

r = requests.get('http://www.imdb.com/title/tt0108778/') # ? r.title friends (tv series 1994–2004) - imdb

you need html parser parse html response , title tag's text:

example using lxml.html:

>>> import requests >>> lxml.html import fromstring >>> r = requests.get('http://www.imdb.com/title/tt0108778/') >>> tree = fromstring(r.content) >>> tree.findtext('.//title') u'friends (tv series 1994\u20132004) - imdb'

there other options, like, example, mechanize library:

>>> import mechanize >>> br = mechanize.browser() >>> br.get('http://www.imdb.com/title/tt0108778/') >>> br.title() 'friends (tv series 1994\xe2\x80\x932004) - imdb'

what alternative take depends on going next: parse page more data, or, may be, want interact it: click buttons, submit forms, follow links etc.

besides, may want utilize api provided imdb, instead of going downwards html parsing, see:

does imdb provide api? imdbpy

example usage of imdbpy package:

>>> imdb import imdb >>> ia = imdb() >>> film = ia.get_movie('0108778') >>> movie['title'] u'friends' >>> movie['series years'] u'1994-2004'

python html html-parsing

No comments:

Post a Comment