This article is Part 2 in an ongoing series discussing the current state of GeoWeb standards.
I started in the introduction by talking about the general Web and some considerations of how geospatial data standards face unique challenges in resolving to broader data interoperability.
In evaluating the current status of standards, it's useful to give an overview of the current standards, and brief thoughts on where they are working and need specific addressing.
I will also note that this discussion is focused on web-oriented geospatial data standards. There are many other geospatial data formats that exist - but are either too esoteric, proprietary, or not Web-aligned, to be useful in considering their application to utilization in the broad Web.
Of course, with the first example I will slightly bend the statement above. Shapefiles are the bastard geodata denizens of the web. They are annoying in multiple ways. Foremost being that they are a proprietary data standard that is found entirely too common across geodata portals - especially government portals. However, there is too much information shared, and open tools that can use them, to ignore as serving a place in the GeoWeb.
Shapefiles are difficult to work with in the web. They are like portable databases, but actually consist of several files: datastore (dbf), geometries (shp). data to geometry join (shx), and optionally a projection definition (prj). Deriving from the usage as a binary data definition for desktop software storage developed by ESRI, they have historic shortcomings such as 12-character limit on attribute names, and restriction to a single geometry type (i.e. can't mix lines, points, and polygons).
In addition with respect to web standards, they obviously have to deal with the multiple files, lack of a Mime-Type, and no web characteristics such as linking.
Microformat-Geo & Adr
Microformats are basic attempt at embedding data within generic HTML markup. The geospatial formats include simple 2-D coordinates, or an address; geo and adr respectively.
Microformats are nice because they align well within a prevalent data format and allow non-geographic expert users to easily embed information, either directly or via simple tools. Google and Yahoo both openly provide support for Microformats through improved search reliability and even some basic data manipulation tools via APIs. Other tools such as libraries and extensions also provide compelling use of Microformats with geospatial documents.
However, basic limits are that geo only allows for latitude and longitude, without any support for a height. Adr at least can provide more complete information, but neight geo nor adr allow for linking to external geometries - a common shortcoming of most the formats discussed.
Another problem with Microformats are that they don't allow linking to context within a document. So while you can include location information in a paragraph, it is not possible to express how this location relates to the rest of an article or narrative.
So, for example, while it's possible to markup the location of the White House, one can't easily denote if this was the location of a press conference, or just that the U.S. President was there, or whatever else may have occured.
GeoRSS arose out of the simple desire to include location in the increasingly prevalent RSS and Atom feeds from blogs and news sources. It's another community driven and owned standard, like Microformats, that met existing needs from a bottom-up approach.
GeoRSS over the past few years has become increasingly common amongst web sites using maps and geospatial data. Google Maps, Yahoo Maps, Microsoft Bing all support exporting and importing data via GeoRSS, and major news outlets such as Reuters, and Al-Jazeera output GeoRSS.
Despite it's widespread adoption, GeoRSS has some complexities that arose out of it's development. There are 9 potential "flavors" of GeoRSS, although this is largely due to the 3 different formats of feeds: RDF, RSS, and Atom. There are still 3 formats of GeoRSS itself that can be utilized in any of the 3 feed formats: W3C, Simple, and GML. This causes confusion for developers, especially since W3C format is deprecated but still widely used. Perhaps this is one reason that despite GeoRSS being a simple extension to existing feed formats, there still is not GeoRSS support in any of the major news feed readers, except perhaps limited support in FriendFeed.
In addition, GeoRSS hasn't really advanced in quite awhile despite multiple requests and discussions of extensions for multiple locations, time spans, external geometries, and feature identification.
KML, or Keyhole Markup Language, became a defacto standard out of the popularization of Google Earth, formerly Keyhole Earth, and the wide creation and sharing of geographic data to use inside of this compelling 3D geobrowser.
KML offers a rich markup supporting feature locations, attributes, visual styling, 3D models, addresses, and even Atom links. In addition, it is now an OGC standard, and recently Google announced there were more than 500,000 KML files and 2 billion KML placemarks, or features (making an average of 4 placemarks per KML file).
However, KML is very clearly a direct object representation of the Google Earth application. Attribute names follow a rough camel-case convention based on parent or child classes, but sometimes this simple rule is broken in unclear ways, making it difficult for tool developers to create compliant tools. In addition, the styling capability is rudimentary with little true cascading support and no attribute or class styling capabilities.
Google continues to push forward the KML specification with vendor specific,
gx:, extensions. The rest of the geospatial community has yet to attempt to influence the spec in any way despite these apparent problems.
A much more recent community standard, GeoJSON merely adds geographic markup to the JSON format. This is primarily targeted to client to server communication and takes advantage of the compact size, and quick evaluation of JSON data.
GeoJSON nominally followed the GeoRSS definitions, making it easy to understand and leverage existing tools and knowledge. However, JSON itself does not provide for any actual format specification or schema definition, leaving clients to determine the layout of the JSON to agreed upon documentation rather than actual standards. This is becoming especially problematic as more services expose GeoJSON via APIs to third-party developers. It is really little more than arbitrary, unique XML without the extensive syntax.
In response to the history of arbitrary, unique XML, Geography Markup Language, GML, was developed. GML follows a very strict and feature-rich mechanism for creating geographic schemas and domain specific semantics. It is used for very precise data interchange, typically over OGC services like WFS. GML is targeted to bridge the span from 1-D to 4-D geometries, multiple domains, and entirely customizable profiles, or versions, depending on a user or developer's needs.
With GML's power comes much complexity. Developers are typically required to devise and include their own unique schema definitions when using GML. The scope of writing a generic GML client is akin to writing a Ruby script interpreter and is daunting to general web developers that only want to include simple geographic capabilities to their general services. This complexity hampers it's widespread adoption.
There are a variety of other formats that are beginning to emerge on the broader web through a variety of fronts.
Spatialite is the set of spatial extensions to the open, portable SQLite format. SQLite is a file database that provides for full relational capabilities in a single file. Spatialite therefore adds geographic columns and rudimentary geospatial query support.
SQLite is already used by a variety of tools such as Google Gears for offline support and the Google Maps on the iPhone for storing tiles. We chose Spatialite for our Geocoder due to it's compact nature and deployability. Spatialite makes for a very compelling option when you need to have access to an entire geographic database and perform operations on the data.
GeoPDF is working to become an open format. There is a pending OGC adoption of the georegistration embedding, and Adobe is pushing the ISO 32000 spec that includes how to embed vector and geographic drawing. There is still however a very fragmented ecosystem of tools and interoperability that threatens the format as a mechanism for disseminating geographic data.
CAP, Common Alerting Protocol, is a realtime focused format for sharing out alerts such as emergency news, earthquakes, or municipal signals. It is still an XML format with no real mechanism for ensuring delivery or timliness, and it is not clear the advantages over more broadly used and extended formats such as Atom.
There are still even more formats that are used in the GeoWeb such as CSV, GPX, RDF, and even OpenStreetMap (OSM). However, it is not really worth discussing these here as they are either too generic (CSV), or still too nascent (OSM) to really consider as an existing GeoWeb standard. They will, however, be discussed later in looking to the future of geodata formats.
Also, Semantic data such as Linked Data, RDF, or OWL are continuing to bubble beneath the surface. I will go in depth later on the potentials of semantic geospatial data standards.
Beyond just data formats, there are a number of GeoWeb service, or interface, standards. Open Geospatial Consortium (OGC) dominates this landscape and provides various querying specifications such as Web Feature Service (WFS) and Web Map Service (WMS) in addition to other cataloguing and location-based service interfaces.
WFS and WMS both provide very full-featured capabilities, but also follow older paradigms of interfaces. Fortunately neither of them are SOAP-based, but instead rely on simple Query parameters for specifying bounding boxes, layers, formats, projections. Perhaps the biggest difficulty is that the service description is at the same endpoint as the service itself and often servers use the wrong MIME-types for the documents and errors.
More recently, general web standards organizations such as the World Wide Web Consortium (W3C) have been adopting geospatial additions for browser DOM geolocation, HTTP location information an privacy. ISO and OASIS are looking at OpenSearch-Geo for possible integration into their harvesting and cataloging standards.
OpenSearch-Geo follows the same concepts as GeoJSON and GeoRSS, providing a simple extension to a broadly adopted interface and merely adding geospatial components to it. In addition, by being only a templating specification, it can easily apply to describing general API's such as Flickr, KML network links, or even WFS when applying the appropriate template markup.
However, while OpenSearch-Geo has garnered a lot of interest, it's actually prevalent use isn't clear. There are limited services that offer an explicit, compliant description of their geospatial search interfaces.
The View is Mixed
So the current state of GeoWeb data standards is quite mixed. There is no denying that they have becoming mainstream. We're seeing some emergence, and divergence of more popular formats in Mapufacture and GeoCommons, both on upload as well as downloads or links. Google has released their figures for KML they've crawled on the web. By contrast, GeoCommons and Mapufacture rely on users to vet and register data sources, providing a different viewpoint into the utility of geospatial data formats on the web.
The above charts show the composition of data uploads, links, and entire composition of geodata uploaded to GeoCommons and Mapufacture. As an interesting comparison, downloads are definitely trending towards lighter weight standards: KML downloads account for 67.8% of all downloads, with CSV's at 25.7% and Shapefiles for merely 6.3%. It is worth noting that this is merely a narrow viewpoint in the larger web - not accounting for OGC standards, raster data, and services. However it is still an enlightening consideration in looking at how people are actively engaging with the GeoWeb.
The different formats have all been used extensively, but when which format is most appropriate isn't clear. This leads many applications to include multiple formats, an easy and appropriate solution but also one that can confuse users and provide for duplication. We'll dive into more general problems in the next article.