Saturday 15 August 2015

python - Need for help in this scrapy regular expression -



python - Need for help in this scrapy regular expression -

i pretty new scrapy, trying crawl website using crawlspider, want crawl recursively based on "next" button. not working. think problem comes regular expression, checked many times, can not find mistake. crawl landing page without proceed next page.

# -*- coding: utf-8 -*- start_urls = ['https://shopping.yahoo.com/merchantrating/?mid=13652'] rules = ( rule(linkextractor(allow = "/merchantrating/;_ylt=anf3hf19r8mgfpwuyujuny4ceb0f\?mid=13652&sort=1&start=\d+"), callback = 'parse_start_url', follow = true), ) def parse_start_url(self, response): sel = selector(response) contents = sel.xpath('//p') content in contents: item = bedbugsitem() item['pagecontent'] = content.xpath('text()').extract() self.items.append(item) homecoming self.items

use xpath instead:

rules = ( rule(linkextractor( restrict_xpaths = [ "//div[@class='pagination']//a[contains(., 'next')]" ]), callback = 'parse_start_url', follow = true), )

python regex scrapy

No comments:

Post a Comment