python - Scrapy accidentally over-writing items when running concurrently? -
i have been running scrapy scraper, , noticed (about 10% of time) returning duplicate results. in other words, assigning results item item.
i assume concurrency , global variables, i'm not sure what. have set 250ms delay between requests, looks though results still beingness returned in parallel , accidentally over-writing each other.
this spider code:
def start_requests(self): settings = get_project_settings() ids = settings.get('ids', none) i, id in enumerate(ids): yield formrequest( url=self._form_url, formdata={ 'id': id }, meta={'id': id}, ) def parse(self, response): addr_xpath = '//div[@class="w80p left floatright"]//text()' addresses = response.xpath(addr_xpath).extract() if not addresses: raise dropitem("can't find address") item = myitem() item['address'] = ', '.join(addresses) homecoming item
what doing wrong?
python web-scraping scrapy
No comments:
Post a Comment