Status
that was worth watching all the way to the very last second #surpriseendings
Location
Washington, DC
Subscribe to GeoRSS Subscribe to KML


Standards

GeoWeb Standards – Where we are

Published in Geo, Standards


This article is Part 2 in an ongoing series discussing the current state of GeoWeb standards.

I started in the introduction by talking about the general Web and some considerations of how geospatial data standards face unique challenges in resolving to broader data interoperability.

In evaluating the current status of standards, it’s useful to give an overview of the current standards, and brief thoughts on where they are working and need specific addressing.

I will also note that this discussion is focused on web-oriented geospatial data standards. There are many other geospatial data formats that exist – but are either too esoteric, proprietary, or not Web-aligned, to be useful in considering their application to utilization in the broad Web.

Shapefiles

Of course, with the first example I will slightly bend the statement above. Shapefiles are the bastard geodata denizens of the web. They are annoying in multiple ways. Foremost being that they are a proprietary data standard that is found entirely too common across geodata portals – especially government portals. However, there is too much information shared, and open tools that can use them, to ignore as serving a place in the GeoWeb.

Shapefiles are difficult to work with in the web. They are like portable databases, but actually consist of several files: datastore (dbf), geometries (shp). data to geometry join (shx), and optionally a projection definition (prj). Deriving from the usage as a binary data definition for desktop software storage developed by ESRI, they have historic shortcomings such as 12-character limit on attribute names, and restriction to a single geometry type (i.e. can’t mix lines, points, and polygons).

In addition with respect to web standards, they obviously have to deal with the multiple files, lack of a Mime-Type, and no web characteristics such as linking.

Microformat-Geo & Adr

Microformats are basic attempt at embedding data within generic HTML markup. The geospatial formats include simple 2-D coordinates, or an address; geo and adr respectively.

Microformats are nice because they align well within a prevalent data format and allow non-geographic expert users to easily embed information, either directly or via simple tools. Google and Yahoo both openly provide support for Microformats through improved search reliability and even some basic data manipulation tools via APIs. Other tools such as libraries and extensions also provide compelling use of Microformats with geospatial documents.

However, basic limits are that geo only allows for latitude and longitude, without any support for a height. Adr at least can provide more complete information, but neight geo nor adr allow for linking to external geometries – a common shortcoming of most the formats discussed.

Another problem with Microformats are that they don’t allow linking to context within a document. So while you can include location information in a paragraph, it is not possible to express how this location relates to the rest of an article or narrative.

So, for example, while it’s possible to markup the location of the White House, one can’t easily denote if this was the location of a press conference, or just that the U.S. President was there, or whatever else may have occured.

GeoRSS

GeoRSS arose out of the simple desire to include location in the increasingly prevalent RSS and Atom feeds from blogs and news sources. It’s another community driven and owned standard, like Microformats, that met existing needs from a bottom-up approach.

GeoRSS over the past few years has become increasingly common amongst web sites using maps and geospatial data. Google Maps, Yahoo Maps, Microsoft Bing all support exporting and importing data via GeoRSS, and major news outlets such as Reuters, and Al-Jazeera output GeoRSS.

Despite it’s widespread adoption, GeoRSS has some complexities that arose out of it’s development. There are 9 potential “flavors” of GeoRSS, although this is largely due to the 3 different formats of feeds: RDF, RSS, and Atom. There are still 3 formats of GeoRSS itself that can be utilized in any of the 3 feed formats: W3C, Simple, and GML. This causes confusion for developers, especially since W3C format is deprecated but still widely used. Perhaps this is one reason that despite GeoRSS being a simple extension to existing feed formats, there still is not GeoRSS support in any of the major news feed readers, except perhaps limited support in FriendFeed.

In addition, GeoRSS hasn’t really advanced in quite awhile despite multiple requests and discussions of extensions for multiple locations, time spans, external geometries, and feature identification.

KML

KML, or Keyhole Markup Language, became a defacto standard out of the popularization of Google Earth, formerly Keyhole Earth, and the wide creation and sharing of geographic data to use inside of this compelling 3D geobrowser.

KML offers a rich markup supporting feature locations, attributes, visual styling, 3D models, addresses, and even Atom links. In addition, it is now an OGC standard, and recently Google announced there were more than 500,000 KML files and 2 billion KML placemarks, or features (making an average of 4 placemarks per KML file).

However, KML is very clearly a direct object representation of the Google Earth application. Attribute names follow a rough camel-case convention based on parent or child classes, but sometimes this simple rule is broken in unclear ways, making it difficult for tool developers to create compliant tools. In addition, the styling capability is rudimentary with little true cascading support and no attribute or class styling capabilities.

Google continues to push forward the KML specification with vendor specific, gx:, extensions. The rest of the geospatial community has yet to attempt to influence the spec in any way despite these apparent problems.

GeoJSON

A much more recent community standard, GeoJSON merely adds geographic markup to the JSON format. This is primarily targeted to client to server communication and takes advantage of the compact size, and quick evaluation of JSON data.

GeoJSON nominally followed the GeoRSS definitions, making it easy to understand and leverage existing tools and knowledge. However, JSON itself does not provide for any actual format specification or schema definition, leaving clients to determine the layout of the JSON to agreed upon documentation rather than actual standards. This is becoming especially problematic as more services expose GeoJSON via APIs to third-party developers. It is really little more than arbitrary, unique XML without the extensive syntax.

GML

In response to the history of arbitrary, unique XML, Geography Markup Language, GML, was developed. GML follows a very strict and feature-rich mechanism for creating geographic schemas and domain specific semantics. It is used for very precise data interchange, typically over OGC services like WFS. GML is targeted to bridge the span from 1-D to 4-D geometries, multiple domains, and entirely customizable profiles, or versions, depending on a user or developer’s needs.

With GML’s power comes much complexity. Developers are typically required to devise and include their own unique schema definitions when using GML. The scope of writing a generic GML client is akin to writing a Ruby script interpreter and is daunting to general web developers that only want to include simple geographic capabilities to their general services. This complexity hampers it’s widespread adoption.

Other formats

There are a variety of other formats that are beginning to emerge on the broader web through a variety of fronts.

Spatialite is the set of spatial extensions to the open, portable SQLite format. SQLite is a file database that provides for full relational capabilities in a single file. Spatialite therefore adds geographic columns and rudimentary geospatial query support.

SQLite is already used by a variety of tools such as Google Gears for offline support and the Google Maps on the iPhone for storing tiles. We chose Spatialite for our Geocoder due to it’s compact nature and deployability. Spatialite makes for a very compelling option when you need to have access to an entire geographic database and perform operations on the data.

GeoPDF is working to become an open format. There is a pending OGC adoption of the georegistration embedding, and Adobe is pushing the ISO 32000 spec that includes how to embed vector and geographic drawing. There is still however a very fragmented ecosystem of tools and interoperability that threatens the format as a mechanism for disseminating geographic data.

CAP, Common Alerting Protocol, is a realtime focused format for sharing out alerts such as emergency news, earthquakes, or municipal signals. It is still an XML format with no real mechanism for ensuring delivery or timliness, and it is not clear the advantages over more broadly used and extended formats such as Atom.

There are still even more formats that are used in the GeoWeb such as CSV, GPX, RDF, and even OpenStreetMap (OSM). However, it is not really worth discussing these here as they are either too generic (CSV), or still too nascent (OSM) to really consider as an existing GeoWeb standard. They will, however, be discussed later in looking to the future of geodata formats.

Also, Semantic data such as Linked Data, RDF, or OWL are continuing to bubble beneath the surface. I will go in depth later on the potentials of semantic geospatial data standards.

Services

Beyond just data formats, there are a number of GeoWeb service, or interface, standards. Open Geospatial Consortium (OGC) dominates this landscape and provides various querying specifications such as Web Feature Service (WFS) and Web Map Service (WMS) in addition to other cataloguing and location-based service interfaces.

WFS and WMS both provide very full-featured capabilities, but also follow older paradigms of interfaces. Fortunately neither of them are SOAP-based, but instead rely on simple Query parameters for specifying bounding boxes, layers, formats, projections. Perhaps the biggest difficulty is that the service description is at the same endpoint as the service itself and often servers use the wrong MIME-types for the documents and errors.

More recently, general web standards organizations such as the World Wide Web Consortium (W3C) have been adopting geospatial additions for browser DOM geolocation, HTTP location information an privacy. ISO and OASIS are looking at OpenSearch-Geo for possible integration into their harvesting and cataloging standards.

OpenSearch-Geo follows the same concepts as GeoJSON and GeoRSS, providing a simple extension to a broadly adopted interface and merely adding geospatial components to it. In addition, by being only a templating specification, it can easily apply to describing general API’s such as Flickr, KML network links, or even WFS when applying the appropriate template markup.

However, while OpenSearch-Geo has garnered a lot of interest, it’s actually prevalent use isn’t clear. There are limited services that offer an explicit, compliant description of their geospatial search interfaces.

The View is Mixed

So the current state of GeoWeb data standards is quite mixed. There is no denying that they have becoming mainstream. We’re seeing some emergence, and divergence of more popular formats in Mapufacture and GeoCommons, both on upload as well as downloads or links. Google has released their figures for KML they’ve crawled on the web. By contrast, GeoCommons and Mapufacture rely on users to vet and register data sources, providing a different viewpoint into the utility of geospatial data formats on the web.

The above charts show the composition of data uploads, links, and entire composition of geodata uploaded to GeoCommons and Mapufacture. As an interesting comparison, downloads are definitely trending towards lighter weight standards: KML downloads account for 67.8% of all downloads, with CSV’s at 25.7% and Shapefiles for merely 6.3%. It is worth noting that this is merely a narrow viewpoint in the larger web – not accounting for OGC standards, raster data, and services. However it is still an enlightening consideration in looking at how people are actively engaging with the GeoWeb.

The different formats have all been used extensively, but when which format is most appropriate isn’t clear. This leads many applications to include multiple formats, an easy and appropriate solution but also one that can confuse users and provide for duplication. We’ll dive into more general problems in the next article.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Your thoughts

Published in Conference, Geo, Standards


Later this week I’m speaking at GeoWeb about the current progress of GeoWeb standards, how far we have to go, and how to get there. We have KML and GeoRSS leading the way in searchable, linkable formats, but still a plethora of Shapefiles strewn about. There are questions of findability, semantic ontologies, durability, and expressiveness. What are the adoption rate of these formats and their utility in the future real-time, mobile, linked, open web?

What else do you think is the good and bad of GeoWeb standards?


Taking remote imagery offline to Nigeria

Published in Standards


This weekend I was pinged by Todd Huffman regarding a colleague that is deploying to Los, Nigeria and in search of some good remote imagery to take along.

There were several issues he quickly ran into. Currently imagery is not easy to find for the non-geo expert. The primary interface the broad public is using GoogleEarth which has started doing some nice things with imagery such as time and history. Jos Nigeria

The second requirement was the ability to run “offline”. Obviously he isn’t going to have any kind of bandwidth for downloading new imagery in the field. GoogleEarth allows you to increase your cache size, and you can “pre-fly” an area to grab data. You could then store this cache (on my machine it is located at /Users/ajturner/Library/Caches/Google Earth) and burn to a DVD or USB stick if you had to move it to another machine.

Let’s do the ‘right’ thing

However, obviously the above solution is bending the terms of service. Probably not a large concern when you’re deploying to the middle of Nigeria – but all the same, DigitalGlobe probably wouldn’t be too happy if you started sharing this imagery.

Jos Nigeria - OAMSo my next suggestion was to look for some other source of data – the first of which came to mind is OpenAerialMap. The imagery isn’t nearly as good here, is still under questionable terms of service from iCubed, and is served to the user in tiles. Granted OpenAerialMap is a very new, and as of yet under-served tool. But there are some very valuable lessons from this use case to apply in building out OAM and similar imagery sharing services.

To get the OAM imagery offline there are a few simple approaches. One would be to setup a script to walk the tile pyramid and save the tiles locally. Then use TileCache to serve these off of a disk to a local OpenLayers instance. Not bad – but many new gears and prone to be difficult to maintain.

Peek under the covers

So I began investigating the underlying datasource. Zooming way in to Jos reveals that the imagery is being served by a WMS – a Web Mapping Service.

Clicking on the i-Cubed Landsat copyright link you get the metadata page. The imagery is from 2007 – so not very old, and a very helpful “License: Unknown”, but with a note that the license issue is being worked on. Under “More information” there is now a valuable link to the WMS “Capabilities Document”.

http://hypercube.telascience.org/cgi-bin/landsat7?
request=GetCapabilities&Service=WMS&Version=1.1.1

If you are not familiar with OGC services, the common recipe is that any service provides a GetCapabilities method. This method will return a service definition that tells you what types of data, format, versions, and more that this particular service provides. Unfortunately, in most browsers clicking on the link downloads a “landsat7″ file which does nothing to tell the client (your computer and applications) what kind of file it is. Simple point in why RESTful url’s are much nicer for users. I would have liked to see a landsat7.xml at the very least.

Looking into the file, you can see a number of links proscribed the various methods. However, another problem with OGC Services is that they never actually link to the method or description. In fact, there are 13 pointers to http://hypercube.telascience.org/cgi-bin/landsat7? – the client (developer, application, etc.) is expected to have read the OGC specification document to know that they need to append the request=, and version= and potentially a number of other parameters to actual call that method.

Anyways, RESTful OGC services are a bigger discussion and here we were just focusing on getting imagery before deploying.

Jos Nigeria - QGISThe next step is to pull out QGIS, and add the WMS URL. You have the option of several layers. Global is good for finding the region and zooming in – and then you can use the original layer for higher-resolution imagery.

Of course, we could have just used a the WMS URL to GET the image directly – that is, if you know your WMS spec by heart:

curl "http://hypercube.telascience.org/cgi-bin/landsat7?
request=GetMap&Service=WMS&Version=1.1.1
&LAYERS=original&STYLES=&SRS=EPSG:4326
&WIDTH=512&HEIGHT=512
&FORMAT=image/jpeg
&BBOX=9.9,8.8,10,8.9">landsat.jpg

Jos Nigeria - Topomap

And for more, non-imagery, maps – there is Harvard’s new AfricaMap. It points to 46 topographic and historic basemaps that would be useful to download to a drive an use in the field. For example, to get a topomap:

curl "http://cga-3.hmdc.harvard.edu/cgi-bin/mapserv?
map=/opt/CGA/data/img/uscomp.map
&request=GetMap&Service=WMS&Version=1.1.1
&LAYERS=ONC&STYLES=&SRS=EPSG:4326
&WIDTH=512&HEIGHT=512
&FORMAT=image/png
&BBOX=8,8,10,10">topomap.png

So the answer was – there is some data out there, definitely not as high quality and resolution as the proprietary datasets. And even when there is data, it’s still very difficult to find and utilize. Vector data is finally becoming common (to find, search, and create) but there are still large steps to make raster data as easy to use for non-experts.


OGC Geospatial Search Summit

Published in GeoRSS, OpenSearch, Standards


Last Monday I participated in a Geospatial Search Summit hosted by the OGC as part of the quarterly Technical Committee (TC) meetings. The TC’s are primarily about various working groups discussing progress and status of standards or interoperability demos.

By comparison, this summit was meant as a brainstorming around geo and search interfaces and responses. Pulling from the announcement:

We would as much as possible like to bound the discussion to: 1) common ground for geospatial search for web resources and 2) integrating spatial search into search protocols. As part of the discussion we would also like to get advice from the other communities about which catalog/registry search protocol is the ‘mainstream’ one (or more?) that we (OGC) should align with and in turn, be sure that spatial search is supported in a thoughtful but not cumbersome way by the broader IT standards community.

You can see a partial list of attendees here.

There was a good overview of existing, albeit often quite complex, search interfaces. As is potential in meetings like this where attendees have their own history, investments, and beliefs in standards, the discussion can become difficult to easily resolve.

A couple of interesting agreements came out of the meeting. Foremost was the understanding for guidance of using simple, common formats as they already exist when appropriate. This means using OpenSearch as a base URI templating mechanism and follow GeoRSS-Simple specification for geographic data. Of course, a format can expand upon this and offer more complex formats that conform to more complex specs. But by at least providing a common baseline means that almost any service can easily interconnect with another service.

One difficult mechanism that is missing is a way for geographic search to specify the type of spatial operation. Typically most services assume a “within” or intersects”. For example, what restaurants are within a 5-mile radius of my position. However, it’s apparent that this can be confused based on assumptions and also does not provide for any other type of operations. Again, for example, find me all the hospitals that are not within the hurricane path.

A long-standing model for this is called the DE-9IM spatial operation set. It was presented by Eliseo Clementini, and also frequently attributed to Egenhofer. You can read more about it. Granted, a majority of geospatially-capable search interfaces may not require this, but it’s nice that there is a relatively straight-forward model that everyone can agree on.

I hope more attendees share their thoughts and outcomes. There are definitely many who point out the problems of designing standards in a smoke filled room, and I much rather bringing the discussion out into the open where more people can chime in and contribute.


Privacy and Permissions in Feeds

Published in Standards


GeoRSS has gained a lot of popularity and has recently shown up in some user tracking applications like Dopplr and Plazes. This means you can drop the feeds into your feed reader – or GeoRSS widget or aggregator to mix in your friends’ or family members’ locations with your other feeds.

However, once you pull the feed out of the original service, you should being to wonder about the privacy. Many of these services required authorization before allowing you to pull down feeds. This way they can make some assurance that only allowed people can grab your location feed. However, once the feed is pulled out, it is out of the hands of an authorization system and has a very easy potential to be made unwittingly public.

The onus of security is on the application or aggregator that pulled the feed on behalf of the authorized user. But at the same time once the feed has been retrieved, there is no storage of the authorization credentials with the feed itself. It has essentially been stripped of it’s shell of potential privacy and looking at the feed itself you would have no idea if it was supposed to be kept private, and visible only to certain, unknown persons.

What would be nice would be a mechanism to store at least references to permissions and authorization credentials within the feed itself. That way if an application still has the feed, or wishes to store it and re-aggregate it, they can apply the same authorization as the feed originally had.

Existing Mechanisms?

Brian Suda pointed me to the, currently suspended, Platform for Privacy Preferences. But this appears to be a rather heavy-handed approach. The W3C GeoPriv Working Group is also looking at location privacy but not in terms of feeds, and the idea of permissions and privacy aren’t specific to location (though that is typically where it gets a large amount of attention).

Simple Soutions

I’m wondering if there exists, or could easily be formulated, an additional markup in Atom to specify permissions. It would still be the responsibility of the application to abide by these permissions – but at least they would have the information necessary to do so.

Here is a possible solution. Provide a default access (private), but then refer to authorization endpoints for who would be allowed to view this feed. In this example, if the user can provide OpenID authorization to this URL, then they can view the feed:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" >
   <title>Andrew's location</title>
   <permissions>
     <default access="none"/>
     <permission access="view" href="http://myopenid.com/bobdingle" type="openid"/>
   </permissions>
   ...