Last week I made a quick statement sharing my concern for civic organizations promoting ETL – Extract Transform and Load – of open data instead of developing APIs. I felt it warranted a more thorough response than the terseness of microbursts.
Desire Lines and Road Surfaces
Walk through most parks and any college campus you will quickly notice that dirt worn pathways that connect between the sidewalks indicating pedestrian shortcuts. These desire lines indicate an initial and repeated optimization that lay outside the paved paths. Often these ad-hoc networks are a type of ‘footstream’ that are adopted and paved – or they are left to individual use, muddy in rain and undocumented or supported through groundskeeping. At scale this often explains entire city and county or even national road networks that started as ‘cowpaths’ and through continued and growing usage became official infrastructure – roads and highways – which are relied upon as a matter of business.
This road network is the infrastructure that government develops and promises to support as a necessary mechanism for citizens to build communities and businesses to operate commerce. Information infrastructure is the next generation that government is developing which increasingly becomes the relied upon and required tools for community and commerce. Tim O’Reilly has referred to “Government as a Platform” which means that we must be able to rely on these services as a durable backbone where which we build our numerous and diverse applications.
Opening Data: Prototype or Infrastructure
Open data started as simple file sharing. In my own city the data catalog was a large and easy to read list of datasets with metadata, links to common formats, and updated dates. Through a series of public contests, developers used these file downloads to build some compelling applications to highlight the future of Government IT. In 2008 Apps for Democracy was iLive.at (link dead) and ParkItDC.com (link dead) and again in 2009, Apps for America winner was 311.socialdc.org (n.b. link is dead).
It should be apparent that these contests and applications were interesting desire lines that did not provide or sustain a platform of information which citizens could rely on. Albeit just simple examples, they are indicative of the tendency to build simple one-time applications which unfortunately miss that next step of becoming part of the platform they seek to improve. I’ve heard similar examples from other cities where civic hackers created well-meaning and well-built applications but that sit so far outside the existing government operations that they require continued manual maintenance by these unpaid volunteers with the common outcome that the service stops updating (maybe while even still operating, arguably a worse condition than simply shutting down).
Unfortunately, even the original data catalog has slowly atrophied. Based on my own experience when looking for more recent crime data (the data catalog stops about September 2013) I learned that the internal system was being migrated and the transformation process had faltered and it just wasn’t a priority to get the system back online. It was too removed from their actual job of analyzing and responding to crime to make a separate feed available with any defined timeline.
Which is a stark reminder that despite the amazing capabilities technology can deliver, Government is foremost responsible for serving people, not serving technology. Everything it does in the end is to serve the communities that elect, fund, and generally are employed by, these governments. When most civil engineers are designing roads they don’t apply grandiose design aesthetics and creativity. They pull open their codes & standards, determine the appropriate concrete mixture, depth and rebars based on specs, and get to work developing the road that fits the expected and reliable operations that citizens need.
Operational Open Data is Sustainable Open Data
There are numerous studies, reports, case studies and general community practice have made the case that open data has a great potential benefits to civilian and business communities. Not least of which is the ability for government agencies to more easily share data between one another in addition to improving business efficiency and consumer decision making.
Government has a difficult and extremely important job. As an entity, it is not enamored with new techniques or formats. Attempting to create unidirectional bifurcations of the data create strain which will eventually give way when there is any pressure: time, fiscal, or personnel. New technologies need to understand these processes and costs in order to align themselves if they are to ever become part of the government platform.
For open data to move from a shortcut to part of the stable infrastructure, we need to design it from the beginning to be practical, sustainable and ultimately operational. Open data needs to be the way government operates, and it needs to be part of the living systems that manage and process the data as part of day-to-day business.
To the original question, generalized techniques such as ETL – extract, transform and load – have tremendous flexibility to explore new paths and opportunities. By enabling freedom to explore applications, new formats, and communities, government can observe and understand these desire lines. It can then make the decision if these paths become part of the supported network or if they indicate a necessary redesign of the large system to accommodate these concepts. Personally I’ve seen, and built, many ETL tools and community applications that worked on the outside of government. While fast moving an extremely agile, they ultimately are untenable to provide ongoing and durable platforms for information access.
By comparison we should be encouraging and working directly with government technical staff to specify and prototype API – application programming interface as they provide an excellent mechanism for this prototype to adoption. By developing an external interface to a service the provider is making a contract with end-users that is independent of the implementation details. This enables a developer to use their tools of choice with the intent that it could be rebuilt within government infrastructure while maintaining the promised interface that applications already rely upon. And finally like the observing the increasing depth and width of a dirt path the measured analytics behind and API help prioritize incorporation and operationalization.
Exemplar of this has most recently been the DCAT distributed catalog specification. Neither new nor novel as far as federated data catalogs are concerned, it was an API that was created in conjunction with technologists and government agencies and adopted independent of any technology implementation that is now poised to easily share links to data between numerous national and local government agencies, all in the public. Instead of building more data harvesters, an API means that anyone can both participate in production as well as usage of the open data however best fits their needs.
Desire to Collaboratively Craft
Perhaps the most exciting thing I have observed in my six years living in DC and watching the Open Government movement surge has been the positive growth and excitement of people within government to actively and publicly collaborate. More than merely publishing a catalog and running a competition, government representatives are eager to talk about ideas, share code and data, and hear where they can open their infrastructure for these types of creative developments.
While much of the commercial web is becoming ‘appified’ (and often eschewing access via common or open APIs), perhaps this is one case where it’s superb the government moves more slowly and is just now entering the time of the programmable web. For many of us who volunteer our time and expertise hoping to improve the civil societies in which we live, the best thing we can do is work closely to advise and create the best platform possible.