Status

Location
London, England
Subscribe to GeoRSS Subscribe to KML


Microformat Ruby Parser

Published in Programming, Ruby  |  2 Comments


With help from Assaf, I’ve generalized the Microformat parsers even more. Now, there is a base-class Microformat that provides a structure for any sub-class to be created that specifies a Microformat that you want to parse.

It’s not robust, in that it assumes that the important information is actually in the text of the tag, and nothing is in any of the tag’s attributes. This should probably be added, where the properties would be an array that specify if the value comes from text or a named attribute.


class Microformat < Scraper::Base  def self.properties(*symbols)
    symbols.each do |symbol|
      html_class = symbol.to_s.gsub(/_/, "-")
      process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]
    end
  end
end

class Geo < Microformat
  properties :latitude, :longitude
end
class Adr < Microformat
  properties :post_office_box,
    :extended_address,
    :street_address,
    :locality,
    :region,
    :postal_code,
    :country_name
end

To use you just have a generalized scraper:


class Location < Scraper::Base   array :geos
  array :adrs
  process ".adr", :adrs => Adr
  process ".geo", :geos => Geo
  result :geos, :adrs
end

which will now return an object with a .geos and .adrs for geo and adr attributes, respectively.

Updated: Assaf pointed out that he added the ability to pull out the correct data if the microformat was stored in an abbr vs. a span.
process ".#{html_class}", symbol=>["abbr@title", "a@href", :text]

Similar Posts


Responses

  1. Semantic Web Links 08-08-06 at pixelsebi’s repository says:

    August 8th, 2006 at 4:27 am (#)

    [...] Microformats Ruby Parser - Für alle Ruby Jünger sicherlich einen Blick wert (Frank?) [...]

  2. Labnotes » Microformat Ruby Parser says:

    August 28th, 2006 at 10:51 pm (#)

    [...] Check out what Andrew is doing for Microformat parsing: Now, there is a base-class Microformat that provides a structure for any sub-class to be created that specifies a Microformat that you want to parse. [...]

Leave a Response