I am looking over the FOSS4G Schedule of sessions. It’s all table based, and it’s somewhat difficult to find specific tracks, rooms, etc. So I took what was the table-based, non-semantic, calendar and converted it into a much more useful hCalendar output, which can be easily translated to iCal for your subscription fun using Brian Suda’s X2V.
Here is the current HTML of the schedule. As you can see, this is an absolute mess of DOM. This table is in fact already the 4th embedded table (tables-within-tables-within-tables oh my!)
Tuesday, 12 September 2006
(Amphipôle (niv. 3): 07:00 - 09:00)
 Getting Started with MapServer
by Mr. Jeff MCKENNA (DM Solutions Group)
In the middle there was some actual interesting bits, such as presentation title, author, times, etc. So what we need to do is walk through all this and build up a conference.
Employing some slick Ruby scripting – and using the very useful scrAPI from Assaf we can define scrapers to walk over the multiple days, and then within those days grab each of the sessions. These are then output into proper hCalendar format like:<span class="vevent"> <a class="url" href="http://www.foss4g2006.org/contributionDisplay.py?contribId=189&sessionId=46&confId=1"> <span class="summary">Enabling Users to Produce personalized Geodata</span> <span class="details"><span class="vcard"><span class="fn">Mr. Andrew TURNER</span><span class="org">HighEarthOrbit</span></span></span> <abbr class="dtstart" title="2006-09-15T10:30:00Z">Friday, 15 September 2006 from 10:30</abbr>- <abbr class="dtend" title="2006-09-15T11:00:00Z">11:00</abbr>, at the <span class="location">Amphimax MAX 350</span> </a> </span>
The code below makes parsing the nightmare above fairly simple, but due to the lack of any proper classes or id’s (each presentation is
id="entry"– eep!), we have to find the bits we want by their current markup attributes. Not suggested, but at least this is nicer than trying to figure out the 10-levels of DOM starting at the root.
You can see the parser here.
- The hidden goodness of the new Flickr geo
- scrAPI – Microformat Parsing in Ruby
- Microformat Ruby Parser
- GreaseRoute – Mapping the web
- Google Calendar and abusing/not-using XML