Microformat Ruby Parser

With help from Assaf, I've generalized the Microformat parsers even more. Now, there is a base-class Microformat that provides a structure for any sub-class to be created that specifies a Microformat that you want to parse.

It's not robust, in that it assumes that the important information is actually in the text of the tag, and nothing is in any of the tag's attributes. This should probably be added, where the properties would be an array that specify if the value comes from text or a named attribute.


class Microformat < Scraper::Base def self.properties(*symbols)
symbols.each do |symbol|
html_class = symbol.to_s.gsub(/_/, "-")
process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]
end
end
end

class Geo < Microformat
properties :latitude, :longitude
end
class Adr < Microformat
properties :post_office_box,
:extended_address,
:street_address,
:locality,
:region,
:postal_code,
:country_name
end

To use you just have a generalized scraper:


class Location < Scraper::Base array :geos
array :adrs
process ".adr", :adrs => Adr
process ".geo", :geos => Geo
result :geos, :adrs
end

which will now return an object with a .geos and .adrs for geo and adr attributes, respectively.

Updated: Assaf pointed out that he added the ability to pull out the correct data if the microformat was stored in an abbr vs. a span.
process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]

About this article

written on
posted in ProgrammingRuby Back to Top

About the Author

Andrew Turner is an advocate of open standards and open data. He is actively involved in many organizations developing and supporting open standards, including OpenStreetMap, Open Geospatial Consortium, Open Web Foundation, OSGeo, and the World Wide Web Consortium. He co-founded CrisisCommons, a community of volunteers that, in coordination with government agencies and disaster response groups, build technology tools to help people in need during and after a crisis such as an earthquake, tsunami, tornado, hurricane, flood, or wildfire.