nokogiri - refactoring Ruby scraping code -
basically, have multiple .main_entry
blocks on each page , need pull couple of pieces of info each. how can refactored methods?
require 'open-uri' require 'nokogiri' url = #url doc = nokogiri::html(open(url)) doc.css(".main_entry").each |item| artist = item.at_css(".list_artist").text title = item.at_css(".list_album").text puts "#{artist} - #{title}" end
i have arrived @ mess below, throws undefined local variable or method 'release'
error seems related methods beingness over-written. explain me process code below goes through, why breaks downwards , should turn fix? should each .main_entry
block saved kind of cache or array first, before instantiating?
require 'open-uri' require 'nokogiri' class scraper def initialize(url) @url = url end def release @release ||= doc.css(".main_entry") || [] end release.each |item| define_method(:artist) @artist ||= item.at_css(".list_artist").text end define_method(:title) @title ||= item.at_css(".list_album").text end end private attr_reader :url def doc @doc ||= nokogiri::html(open(url)) end end scraper = scraper.new( #url puts "#{scraper.artist} - #{scraper.title}"
here suggestion:
require 'open-uri' require 'nokogiri' class scrapedrelease attr_reader :item def initialize(item) @item = item end def artist @artist ||= item.at_css(".list_artist").text end def title @title ||= item.at_css(".list_album").text end end class scraper def initialize(url) @url = url end def releases @releases ||= (doc.css(".main_entry") || []).map { |item| scrapedrelease.new(item) } end private attr_reader :url def doc @doc ||= nokogiri::html(open(url)) end end
then can do:
scraper.new(url).releases.each |release| puts "#{release.artist} - #{release.title}" end
ruby nokogiri
No comments:
Post a Comment