Python Scrapy tutorial KeyError: 'Spider not found: -
i'm trying write first scrapy spider, ive been next tutorial @ http://doc.scrapy.org/en/latest/intro/tutorial.html i'm getting error "keyerror: 'spider not found: "
i think i'm running command right directory (the 1 scrapy.cfg file)
(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy tree . ├── scrapy │ ├── __init__.py │ ├── items.py │ ├── pipelines.py │ ├── settings.py │ └── spiders │ ├── __init__.py │ └── juno_spider.py └── scrapy.cfg 2 directories, 7 files (proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy ls scrapy scrapy.cfg
here error i'm getting
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy scrapy crawl juno /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: userwarning: not have service_identity module installed. please install <https://pypi.python.org/pypi/service_identity>. without service_identity module , recent plenty pyopenssl tosupport it, twisted can perform rudimentary tls client hostnameverification. many valid certificate/hostname mappings may rejected. verifyhostname, verificationerror = _selectverifyimplementation() traceback (most recent phone call last): file "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module> load_entry_point('scrapy==0.24.4', 'console_scripts', 'scrapy')() file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute _run_print_help(parser, _run_command, cmd, args, opts) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help func(*a, **kw) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command cmd.run(args, opts) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run spider = crawler.spiders.create(spname, **opts.spargs) file "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create raise keyerror("spider not found: %s" % spider_name) keyerror: 'spider not found: juno'
this virtualenv:
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy pip freeze scrapy==0.24.4 twisted==14.0.2 cffi==0.8.6 cryptography==0.6 cssselect==0.9.1 ipdb==0.8 ipython==2.3.0 lxml==3.4.0 pyopenssl==0.14 pycparser==2.10 queuelib==1.2.2 six==1.8.0 w3lib==1.10.0 wsgiref==0.1.2 zope.interface==4.1.1
here code spider wth name attribute filled in:
(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/workspace/development/hacks/prosum-scraper/scrapy cat scrapy/spiders/juno_spider.py import scrapy class junospider(scrapy.spider): name = "juno" allowed_domains = ["http://www.juno.co.uk/"] start_urls = [ "http://www.juno.co.uk/dj-equipment/" ] def parse(self, response): filename = response.url.split("/")[-2] open(filename, 'wb') f: f.write(response.body)
when start project scrapy project name creates directory construction printed:
. ├── scrapy │ ├── __init__.py │ ├── items.py │ ├── pipelines.py │ ├── settings.py │ └── spiders │ ├── __init__.py │ └── juno_spider.py └── scrapy.cfg
but using scrapy project name has collateral effect. if open generated scrapy.cfg
see default settings points scrapy.settings
module.
[settings] default = scrapy.settings
when cat scrapy.settings
file see:
bot_name = 'scrapy' spider_modules = ['scrapy.spiders'] newspider_module = 'scrapy.spiders'
well, nil unusual here. bot name, list of modules scrapy spiders, , module create new spiders using genspider command. far, good.
now let's check scrapy library. has been installed under proscraper isolated virtualenv under /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy
directory. remember site-packages
added sys.path
, contains paths python going search modules. so, guess what... scrapy library has settings
module /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings
imports /home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py
holds default values settings. special attending default spider_modules
entry:
spider_modules = []
maybe starting happening. choosing scrapy project name generated scrapy.settings
module clashes scrapy library scrapy.settings
. , here order in how corresponding paths inserted in sys.path
create python import 1 or other. first appear wins. in case scrapy library settings wins. , hence keyerror: 'spider not found: juno'
.
to solve conflict rename project folder name, let's scrap
:
. ├── scrap │ ├── __init__.py
modify scrapy.cfg
point proper settings
module:
[settings] default = scrap.settings
and update scrap.settings
point proper spiders:
spider_modules = ['scrap.spiders']
but @paultrmbrth suggested recreate project name.
python scrapy
No comments:
Post a Comment