Converting table-based Calendars to hCalendar

FOSS4G Calendar in iCalI am looking over the FOSS4G Schedule of sessions. It's all table based, and it's somewhat difficult to find specific tracks, rooms, etc. So I took what was the table-based, non-semantic, calendar and converted it into a much more useful hCalendar output, which can be easily translated to iCal for your subscription fun using Brian Suda's X2V.

You can get the hCalendar here and the iCal link here.

The Problem

Here is the current HTML of the schedule. As you can see, this is an absolute mess of DOM. This table is in fact already the 4th embedded table (tables-within-tables-within-tables oh my!)

celspacing="0" cellpadding="0" bgcolor="#E6E6E6">

In the middle there was some actual interesting bits, such as presentation title, author, times, etc. So what we need to do is walk through all this and build up a conference.

The Solution

Employing some slick Ruby scripting - and using the very useful scrAPI from Assaf we can define scrapers to walk over the multiple days, and then within those days grab each of the sessions. These are then output into proper hCalendar format like:

  Enabling Users to Produce personalized Geodata
  Mr. Andrew TURNERHighEarthOrbit
  Friday, 15 September 2006 from 10:30-
at the Amphimax MAX 350

The code below makes parsing the nightmare above fairly simple, but due to the lack of any proper classes or id's (each presentation is id="entry" - eep!), we have to find the bits we want by their current markup attributes. Not suggested, but at least this is nicer than trying to figure out the 10-levels of DOM starting at the root.

You can see the parser here.

About this article

written on
posted in ProgrammingRubyFOSS4G Back to Top
Tuesday, 12 September 2006
cellspacing="1" width="100%" style="padding:3px;
border-top:1px solid #E6E6E6;border-bottom:1px solid #E6E6E6;">

07:00 Registration
(Amphipôle (niv. 3): 07:00 - 09:00)

[20] Getting Started with MapServer

by Mr. Jeff MCKENNA (DM Solutions Group)
paper paper