GeoWeb Standards – Current Problems
Part 3 in looking at the current state of GeoWeb Standards. See the introduction here.
It’s time to take a hard look across the board at where we’re coming up short and issues that need to be addressed. One way to summarize:
GeoRSS, KML, and GeoJSON are the itching powder, squirting ink pen, and dribble cup of geodata formats.
– Sean Gillies
Sean is definitely known for his candor, and his viewpoint definitely has merit. Overall the various formats and standards fulfill various needs, but still don’t provide for all use cases, align well with best practices, or make sense to users and developers.
The simplest overall problem with many of these formats, and how they fit into the Web, is that they lack proper web-type descriptions. One primary mechanism that Web clients know how to present data is through the use of MIME-Types. MIME types provide a way for the server to notify clients that the data is in a format such as XML, Text, a PNG Image, and so on. These must be formally registered, but also ad-hoc, or vendor specific, types are commons.
In addition, MIME types allow crawlers and registries to easily record the type of the file in the metadata.
Looking over our list of various GeoWeb standards, it’s very easy to identify which formats abide by this and which don’t.
Atom, JSON, HTML, and SQLite all provide format specific MIME-Types, allowing clients to easily employ the proper applications. However, none provide a special mechanism for notifying that the data includes geospatial markup. Not necessarily a problem, geo shouldn’t be that special.
KML is perhaps the only format that has a geospatial specific MIME-type. However, despite it now being an OGC standard, the MIME Type is still the vendor specific: application/vnd.google-earth.kml+xml. However, KML was particulary ingenious in also providing for the compressed, or zipped, format as a unique MIME-type: vnd.google-earth.kmz.
GML is just XML, so that is entirely not useful in notifying a client that it should try and pass this onto a geo-enabled application. And Shapefiles are agglomeration of multiple files, and even zipped up are only marked as compressed files.
More broadly in services, the OGC has a mime type for service descriptions and responses: application/vnd.ogc.wms_xml, though errors have their own MIME-types: application/vnd.ogc.se_xml.
OpenSearch has a special MIME Type, and obviously Tiles and Image files have MIME-types.
Doesn’t matter if you can’t download it
Another major issues facing many of the GeoWeb formats is their file size. Generally, the web bounces back and forth between disregarding sizes due to assumed, ubiquitous high-speed and reliable connectivity, and trying to speed up pages. But even more important is the fact that many potential users don’t have access to high-speed internet and so their is a huge difference between 10k and 100k or 1MB of data.
To compare the sizes, I took a relatively large dataset from GeoCommons, Statistics Canada, Land and freshwater area, Canada, 2005 and exported it in a variety of formats, both uncompressed, and compressed via standard zip algorithms.
| Format | Size | Zipped |
|---|---|---|
| CSV | 1.3 KB | |
| Shapefile | 5.4 MB | 3.6 MB |
| GeoRSS | 3.3 MB | 1.1 MB |
| KML | 7.3 MB | 2.4 MB |
| Spatialite | 5.4 MB | 3.6 MB |
| JSON | 7.9 MB | 2.3 MB |
CSV just includes latitude and longitude columns of the centroid – so obviously not fully representative. An option would be to include the EWKB in a column for the full geometry – but that is far from any kind of ’standard’ that other tools would know how to intepret.
Perhaps most surprising from these results are that JSON is so large. Unfortunately, the syntax for complex geometries requires a lot of syntax that adds up in representing polygonal data.
Linkability, Durability, and Discoverability
Moving past purely file format and data type specifications brings up the issue of discoverability and linkability in GeoWeb standards. The Web is more than a list of documents that mention resources, but that they can actually link to durable endpoints that can be resolved, queried, accessed, and parsed.
Non-web native formats have no concept of linking. CSV, Shapefiles, and SQLite contain data, but no links. By contrast, Atom, GML, and KML are chock-full of links, although not always used to great effect. JSON can contain links, but without a schema, who knows what the link means.
Obviously the best model to follow here is HTML, which provides automatic links to feeds, OpenSearch description documents, pages, media, styles, and scripts.
However, what happens when a resource disappears and is no longer resolvable? How do you know where else to get another version of the same data, and is it the same data? This is becoming a big problem in the larger web, made more problematic by the use of URL shorteners, but also especially disconcerting when it affects the provenance and accuracy of geospatial data.
But without Complexity
While linkability, durability, and discoverability are vital to GeoWeb standards, the cost of complexity inhibits adoption and probability of support.
This is a long argument in many circles – often made more difficult by practitioners that have been working in a field for years or decades and consider the most opaque formats or concepts commonplace. Look to the OWL/RDF/SemanticWeb space for an example of how there is a mismatch between proponents and the general public.
A standard needs to have clear value to developers and users for it to even begin to be considered. No one is going to dive into a dense specification of a format without even knowing why they would want to use it or how it fits into workflows and architecture.
And complexity can also surface in small ways – inconsistant capitalization of element names (you know who you are KML), or by supporting a plethora of similar, but different flavors making it unclear which to use (GeoRSS).
Tools
In this last section of the overall problems we’re facing with GeoWeb standards, the most prevalent, and easy to address, is the lack of tools that interact and convert between these formats. Really, formats don’t matter to users – they have data from one source such as their camera, PND, blog posts, Government agency, etc. and they want to do something with it like understand what’s going on around them, find their favorite restaurant, save the rainforest, provide services, get their car fixed, or just share stories with their family.
Easy to use, engaging, and data agnostic tools are vital for adoption of any formats. Again you only have to look as far as KML’s meteoric rise from application specific format to perhaps the most ubiquitous, and growing, GeoWeb standard due to the compelling reason of “I want to see my house and things going on around the world”.
Why do none of the major RSS news readers really support GeoRSS? Every site should offer KML and Atom output of their data. Mobile devices should allow me to open in whatever mapping interface or app any of my data from any of my services.
Missing Middle Ground
Amongst the plethora of formats, we’re really missing some middle ground. Each of these formats are quite independent and unique of one another, with little cross pollination and linking occuring.
- Why can’t my KML file link to Atom updates and also to other formats?
- Can OpenSearch describe my tile pyramid?
- How do I describe my path through life, media, events, places I’ve lived, worked, and people I’ve known?
We too easily get caught either in this “this format must solve all possible problems”, or “it’s good enough so why change it”. In between we need to converge to understand use cases, and how these formats and specifications can cross various barriers – connecting the experts with the amateurs, the citizens and the authorities, one with another.
GeoWeb Standards Series
- Introduction
- Where We Are
- Problems


My name is
August 11th, 2009 at 7:04 pm (#)
Awesome post and excellent points!
August 11th, 2009 at 9:02 pm (#)
Awesome post, as usually Andrew!
One thing I keep coming back to is the example of the HTML Web. There was no middle ground in the HTML Web. XML was supposed to be that but HTML + JavaScript remain the mainstays of web pages. Complexity was added gradually, not in a single file format.
Now, it remains to be seen if the HTML experience is one that is a useful model, or if we can actually learn from those mistakes.
August 11th, 2009 at 9:46 pm (#)
Thanks Patrick & Mano – I appreciate the feedback.
and Mano – I agree that HTML has been interesting. HTML is very simple – but Javascript gives it a varying complexity that was missing. I can add behaviors and complex relationships between DOM elements using JS. And then you add CSS for styling, embeddable Flash/Silverlight/Canvas, and you actually have a very wide-ranging set of ’standards’ that let developers do everything from a simple “My Homepage!”, to blogs, to web applications that all render in browsers.
What really made these all work was the browser – a common tool that most users didn’t have to care about what format/technology was using, yet developers settled on common standards to get broad browser support (goodbye ActiveX!)
The processing concept is particularly interesting – Javascript having been the quiet giant for many years, but not HTML5 actually offering workers and other processing pieces. Do GeoWeb formats need something similar?
August 11th, 2009 at 9:49 pm (#)
Normally I would take offense to someone pointing out flaws in my boyfriend JSON, but you have a valid point. Though if you do a comparison of KML to JSON with point level data then JSON is far superior in file size.
I did a quick test of point data in Finder: http://finder.geocommons.com/overlays/14326 and KML was 4.8MB compared to 1.1 MB in JSON. I guess all those brackets take up space when you are doing complex geometries though.
August 12th, 2009 at 6:58 am (#)
Great article Andrew!
August 12th, 2009 at 9:52 am (#)
A few years ago it seemed there was lots of talk about geobrowsers, but most of them took the form of thick client apps like Google, Earth, WorldWind, AGX, etc. The predominant trend, though, is having geo in the browser vs a geobrowser. I think whatever the common tool is it will need to reside in the browser natively. Goes back your sentiment that “geo shouldn’t be that special”.
So what is the common tool that resides in the browser that could link to a MIME type to open any of the current standards or the mythical middle standard. So when the browser detects a KML, GeoRSS etc. file what would it enable? Seems it would be a big map provider cage match – proprietary Google, Bing, Yahoo, ESRI – open source OpenLayers, ModestMaps, MapGuide. It is the KML MIME type issues multiplied.
I download Google Earth and my KML opens there. Then I download ESRI’s AGX and it changes the MIME type so my KML opens there. Except now I’m not downloading anything, the app is resident in the browser. So is it Google cutting a deal with Firefox and making it default in Chrome. IE* would obviously be Bing and I guess Safari would be up for grabs till Apple cut a deal or did their own. All this seems to move away from the idea of open standards and towards vendor lock-in. Maybe I’m misinterpreting the common tool concept entirely but to play devil’s advocate there appear to be dragons in yonder ocean.
August 12th, 2009 at 12:20 pm (#)
Andrew, Mano: have you seen Joe Gregorio’s post about “code on demand”: http://bitworking.org/news/355/code-on-demand-rest-and-cloud-computing? I think KML could benefit from javascript (or the like, but why not just javascript?) as much as HTML has.
August 12th, 2009 at 12:36 pm (#)
Sean:
I think there’s a tension between trying to get format X to do everything and bolting on a technology that is specialized for doing it. It certainly has come up, this idea of allowing JS in KML. I think at this point there are two drags on that:
1) The lack of “geo browsers” which, as Mojo points out, today consists of only thick clients. Although much of that thickness comes from the imagery downloads, of course. Better integration with browsers would help (Google Earth Plugin maybe?)
2) The lack of standardization among geo browsers. Right now, Google Earth is the gold standard of how KML is rendered. Ideally, like a standard web browser, all browsers would support all tags, even if there’s some variation among how they implement. Bolting on an additional technology would slow that standardization down even more, and if you think the discussions on how IE and FF differ in JS engines, just wait for the geo browser wars.
The Google Earth API can be seen as an alternative. It is essentially a JS implementation of KML. But it doesn’t get at your point.
August 12th, 2009 at 2:57 pm (#)
Mano – interesting about your viewpoint. Google Earth is definitely pushing forward on very specific extensions to KML for 3D and photo overlay that will fracture generic “geobrowser” support across the board.
That was actually why I wanted to do modules, so everyone could do “KML basic” and maybe KML styling, but leave off on more complicated, solution specific extensions (kind of like or tags in HTML).
Mojo – I agree there are going to be GeoBrowsers, and Browsers with Geo. MiniMap sidebar, a Firefox extension, is one of the better examples of this. I also wrote some GreaseMonkey scripts (GreaseRoute) to pull up Adr markup and show MapQuest directions embedded in the page. You see the same thing happening even in things like Apple Mail or iCal that detect addresses and show a map.
August 12th, 2009 at 11:58 pm (#)
Chucking over that phrase from Mojo: “a big map provider cage match”. Rooting for the open source underdogs on that one, not holding my breath though. Blue ain’t my best color
Andrew: How would you rate either the Poly9 3d or Ptolemy 3d browser globes & their APIs as candidates for widespread browser globe enablement? Poly9 would get a big ding for not being free/open, but aside from that…
August 13th, 2009 at 3:36 am (#)
Andrew,
Thanks for the excellent overview.
I like your idea of modules. It is practical and follows the approach many GIS manufacturer’s have pursued – namely the basic platform with modules added dependent upon complexity of user needs. It is a flexible approach and avoids all-in-one confusion.
I guess if there was a GeoBrowser then spatial would be ’special’ hmm?
Keep up the good work.
August 13th, 2009 at 3:21 pm (#)
What about Atom Collections? I disagree that HTML is a good model to follow. Not enough structure in how to specify and organize information. It only works well because a few browser developers have built great software, and content developers have had decades of experience with it. I think Atom Collections (+ opensearch?) could be the answer given better software support and more use/experience on the part of content developers.
August 14th, 2009 at 9:30 am (#)
Ah Raj – skipping to the end?
There are definitely arguments about why HTML has problems – but it’s obviously been incredibly successful, and particularly for the reason that you and I agree on – common tools.
It was also flexible enough, open enough (view source) and discussed openly to evolve as necessary.