Sunday 15 July 2012

Python Scrapy tutorial KeyError: 'Spider not found: -



Python Scrapy tutorial KeyError: 'Spider not found: -

i'm trying write first scrapy spider, ive been next tutorial @ http://doc.scrapy.org/en/latest/intro/tutorial.html i'm getting error "keyerror: 'spider not found: "

i think i'm running command right directory (the 1 scrapy.cfg file)

(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy tree . ├── scrapy │   ├── __init__.py │   ├── items.py │   ├── pipelines.py │   ├── settings.py │   └── spiders │   ├── __init__.py │   └── juno_spider.py └── scrapy.cfg 2 directories, 7 files (proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy ls scrapy scrapy.cfg

here error i'm getting

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy scrapy crawl juno /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: userwarning: not have service_identity module installed. please install <https://pypi.python.org/pypi/service_identity>. without service_identity module , recent plenty pyopenssl tosupport it, twisted can perform rudimentary tls client hostnameverification. many valid certificate/hostname mappings may rejected. verifyhostname, verificationerror = _selectverifyimplementation() traceback (most recent phone call last): file "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module> load_entry_point('scrapy==0.24.4', 'console_scripts', 'scrapy')() file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute _run_print_help(parser, _run_command, cmd, args, opts) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help func(*a, **kw) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command cmd.run(args, opts) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run spider = crawler.spiders.create(spname, **opts.spargs) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create raise keyerror("spider not found: %s" % spider_name) keyerror: 'spider not found: juno'

this virtualenv:

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy pip freeze scrapy==0.24.4 twisted==14.0.2 cffi==0.8.6 cryptography==0.6 cssselect==0.9.1 ipdb==0.8 ipython==2.3.0 lxml==3.4.0 pyopenssl==0.14 pycparser==2.10 queuelib==1.2.2 six==1.8.0 w3lib==1.10.0 wsgiref==0.1.2 zope.interface==4.1.1

here code spider wth name attribute filled in:

(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy cat scrapy/spiders/juno_spider.py import scrapy class junospider(scrapy.spider): name = "juno" allowed_domains = ["http://www.juno.co.uk/"] start_urls = [ "http://www.juno.co.uk/dj-equipment/" ] def parse(self, response): filename = response.url.split("/")[-2] open(filename, 'wb') f: f.write(response.body)

when start project scrapy project name creates directory construction printed:

. ├── scrapy │   ├── __init__.py │   ├── items.py │   ├── pipelines.py │   ├── settings.py │   └── spiders │   ├── __init__.py │   └── juno_spider.py └── scrapy.cfg

but using scrapy project name has collateral effect. if open generated scrapy.cfg see default settings points scrapy.settings module.

[settings] default = scrapy.settings

when cat scrapy.settings file see:

bot_name = 'scrapy' spider_modules = ['scrapy.spiders'] newspider_module = 'scrapy.spiders'

well, nil unusual here. bot name, list of modules scrapy spiders, , module create new spiders using genspider command. far, good.

now let's check scrapy library. has been installed under proscraper isolated virtualenv under /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy directory. remember site-packages added sys.path, contains paths python going search modules. so, guess what... scrapy library has settings module /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings imports /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py holds default values settings. special attending default spider_modules entry:

spider_modules = []

maybe starting happening. choosing scrapy project name generated scrapy.settings module clashes scrapy library scrapy.settings. , here order in how corresponding paths inserted in sys.path create python import 1 or other. first appear wins. in case scrapy library settings wins. , hence keyerror: 'spider not found: juno'.

to solve conflict rename project folder name, let's scrap:

. ├── scrap │   ├── __init__.py

modify scrapy.cfg point proper settings module:

[settings] default = scrap.settings

and update scrap.settings point proper spiders:

spider_modules = ['scrap.spiders']

but @paultrmbrth suggested recreate project name.

python scrapy

No comments:

Post a Comment