Saturday 15 June 2013

nokogiri - refactoring Ruby scraping code -



nokogiri - refactoring Ruby scraping code -

basically, have multiple .main_entry blocks on each page , need pull couple of pieces of info each. how can refactored methods?

require 'open-uri' require 'nokogiri' url = #url doc = nokogiri::html(open(url)) doc.css(".main_entry").each |item| artist = item.at_css(".list_artist").text title = item.at_css(".list_album").text puts "#{artist} - #{title}" end

i have arrived @ mess below, throws undefined local variable or method 'release' error seems related methods beingness over-written. explain me process code below goes through, why breaks downwards , should turn fix? should each .main_entry block saved kind of cache or array first, before instantiating?

require 'open-uri' require 'nokogiri' class scraper def initialize(url) @url = url end def release @release ||= doc.css(".main_entry") || [] end release.each |item| define_method(:artist) @artist ||= item.at_css(".list_artist").text end define_method(:title) @title ||= item.at_css(".list_album").text end end private attr_reader :url def doc @doc ||= nokogiri::html(open(url)) end end scraper = scraper.new( #url puts "#{scraper.artist} - #{scraper.title}"

here suggestion:

require 'open-uri' require 'nokogiri' class scrapedrelease attr_reader :item def initialize(item) @item = item end def artist @artist ||= item.at_css(".list_artist").text end def title @title ||= item.at_css(".list_album").text end end class scraper def initialize(url) @url = url end def releases @releases ||= (doc.css(".main_entry") || []).map { |item| scrapedrelease.new(item) } end private attr_reader :url def doc @doc ||= nokogiri::html(open(url)) end end

then can do:

scraper.new(url).releases.each |release| puts "#{release.artist} - #{release.title}" end

ruby nokogiri

No comments:

Post a Comment