Microformat Ruby Parser
With help from Assaf, I’ve generalized the Microformat parsers even more. Now, there is a base-class Microformat that provides a structure for any sub-class to be created that specifies a Microformat that you want to parse.
It’s not robust, in that it assumes that the important information is actually in the text of the tag, and nothing is in any of the tag’s attributes. This should probably be added, where the properties would be an array that specify if the value comes from text or a named attribute.
class Microformat < Scraper::Base def self.properties(*symbols)
symbols.each do |symbol|
html_class = symbol.to_s.gsub(/_/, "-")
process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]
end
end
end
class Geo < Microformat
properties :latitude, :longitude
end
class Adr < Microformat
properties :post_office_box,
:extended_address,
:street_address,
:locality,
:region,
:postal_code,
:country_name
end
To use you just have a generalized scraper:
class Location < Scraper::Base array :geos
array :adrs
process ".adr", :adrs => Adr
process ".geo", :geos => Geo
result :geos, :adrs
end
which will now return an object with a .geos and .adrs for geo and adr attributes, respectively.
Updated: Assaf pointed out that he added the ability to pull out the correct data if the microformat was stored in an abbr vs. a span.
process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]
My name is
August 8th, 2006 at 4:27 am (#)
[...] Microformats Ruby Parser - Für alle Ruby Jünger sicherlich einen Blick wert (Frank?) [...]
August 28th, 2006 at 10:51 pm (#)
[...] Check out what Andrew is doing for Microformat parsing: Now, there is a base-class Microformat that provides a structure for any sub-class to be created that specifies a Microformat that you want to parse. [...]