Status
screenshots/videos/online info? /cc
Location
Washington, DC
Subscribe to GeoRSS Subscribe to KML


Geo

GeoWeb Standards – Where we need to go

Published in Geo, Standards


In previous articles on the status of the GeoWeb I highlighted the myriad of options and problems with current GeoWeb standards and interfaces. Overall, it’s clear that the practice of geospatial data publication and sharing in a web oriented way is still very nascent but getting better at the same time it becomes more mainstream. More data is being created and published in web-oriented ways that make it more consumable and usable.

Too often standards and tools are being by domain experts and technologists that lead to overly complex, and irrelevant formats that become a burden and introduce as many problems as they are trying to solve.

What they’re often not considering are the end user experiences. Who are the users, what are they trying to achieve, and how can these formats make for better, and easier utilization of these tools.

Granted, there are expert users. People who really want to make intricately related, projected, spatially and spectrally bounded queries into data and utilize them in advanced analytics engines. But these are not the majority and they’re not what is driving the long-term demand on the GeoWeb (you can use ‘long-tail’ here if you would like). Who are the users that want to engage with this information on a daily basis in their personal lives, businesses, family, safety, governance, and goals.

Grassroots is an option

I’m a very big fan of grassroots organization and emergent structures. The needs tend to grow from real demand, and solutions are built through actual demonstrated benefit and impact. They are agile, evolutionary, and garner broad support amongst users and developers. These are all aspects that are beneficial to achieving standards that meet the needs of end users and provide good experiences.

However, it is not the only solution. Grassroots tends to look at the immediate needs and may not incorporate more distant issues and expected needs. They seek for broad appeal, and “good enough” rather than totally encompassing all potential aspects of all interested domains. Top-down, industry derived, committee driven standards provide more directed needs and objectives that can serve different types of users.

So the solution is a hybrid – where grassroots solutions are encouraged as demonstrators and emergent needs – that are then accepted and supported by more formal organizations.

Conversation is required

But we also need to open up the conversation beyond just technologists and experts. We need to be engaging and understanding users – and not merely from the “how do I sell them more of my coffee”, but “what can I do to make their lives better”? And actually asking and engaging with them in dialogues.

This technique of user stories, and engagement is not new or unused. However it appears to be missing from the GeoWeb standards developments. We’ve been designing standards for ourselves first, and then foisting these upon others. Instead, we need to understand their needs and issues, and then apply our expert knowledge in how to approach solutions properly.

Other articles in the GeoWeb Standards series:

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Current Problems

Published in Geo, Standards


Part 3 in looking at the current state of GeoWeb Standards. See the introduction here.

It’s time to take a hard look across the board at where we’re coming up short and issues that need to be addressed. One way to summarize:

GeoRSS, KML, and GeoJSON are the itching powder, squirting ink pen, and dribble cup of geodata formats.
Sean Gillies

Sean is definitely known for his candor, and his viewpoint definitely has merit. Overall the various formats and standards fulfill various needs, but still don’t provide for all use cases, align well with best practices, or make sense to users and developers.

The simplest overall problem with many of these formats, and how they fit into the Web, is that they lack proper web-type descriptions. One primary mechanism that Web clients know how to present data is through the use of MIME-Types. MIME types provide a way for the server to notify clients that the data is in a format such as XML, Text, a PNG Image, and so on. These must be formally registered, but also ad-hoc, or vendor specific, types are commons.

In addition, MIME types allow crawlers and registries to easily record the type of the file in the metadata.

Looking over our list of various GeoWeb standards, it’s very easy to identify which formats abide by this and which don’t.

Atom, JSON, HTML, and SQLite all provide format specific MIME-Types, allowing clients to easily employ the proper applications. However, none provide a special mechanism for notifying that the data includes geospatial markup. Not necessarily a problem, geo shouldn’t be that special.

KML is perhaps the only format that has a geospatial specific MIME-type. However, despite it now being an OGC standard, the MIME Type is still the vendor specific: application/vnd.google-earth.kml+xml. However, KML was particulary ingenious in also providing for the compressed, or zipped, format as a unique MIME-type: vnd.google-earth.kmz.

GML is just XML, so that is entirely not useful in notifying a client that it should try and pass this onto a geo-enabled application. And Shapefiles are agglomeration of multiple files, and even zipped up are only marked as compressed files.

More broadly in services, the OGC has a mime type for service descriptions and responses: application/vnd.ogc.wms_xml, though errors have their own MIME-types: application/vnd.ogc.se_xml.

OpenSearch has a special MIME Type, and obviously Tiles and Image files have MIME-types.

Doesn’t matter if you can’t download it

Another major issues facing many of the GeoWeb formats is their file size. Generally, the web bounces back and forth between disregarding sizes due to assumed, ubiquitous high-speed and reliable connectivity, and trying to speed up pages. But even more important is the fact that many potential users don’t have access to high-speed internet and so their is a huge difference between 10k and 100k or 1MB of data.

To compare the sizes, I took a relatively large dataset from GeoCommons, Statistics Canada, Land and freshwater area, Canada, 2005 and exported it in a variety of formats, both uncompressed, and compressed via standard zip algorithms.

Format Size Zipped
CSV 1.3 KB
Shapefile 5.4 MB 3.6 MB
GeoRSS 3.3 MB 1.1 MB
KML 7.3 MB 2.4 MB
Spatialite 5.4 MB 3.6 MB
JSON 7.9 MB 2.3 MB

CSV just includes latitude and longitude columns of the centroid – so obviously not fully representative. An option would be to include the EWKB in a column for the full geometry – but that is far from any kind of ’standard’ that other tools would know how to intepret.

Perhaps most surprising from these results are that JSON is so large. Unfortunately, the syntax for complex geometries requires a lot of syntax that adds up in representing polygonal data.

Linkability, Durability, and Discoverability

Moving past purely file format and data type specifications brings up the issue of discoverability and linkability in GeoWeb standards. The Web is more than a list of documents that mention resources, but that they can actually link to durable endpoints that can be resolved, queried, accessed, and parsed.

Non-web native formats have no concept of linking. CSV, Shapefiles, and SQLite contain data, but no links. By contrast, Atom, GML, and KML are chock-full of links, although not always used to great effect. JSON can contain links, but without a schema, who knows what the link means.

Obviously the best model to follow here is HTML, which provides automatic links to feeds, OpenSearch description documents, pages, media, styles, and scripts.

However, what happens when a resource disappears and is no longer resolvable? How do you know where else to get another version of the same data, and is it the same data? This is becoming a big problem in the larger web, made more problematic by the use of URL shorteners, but also especially disconcerting when it affects the provenance and accuracy of geospatial data.

But without Complexity

While linkability, durability, and discoverability are vital to GeoWeb standards, the cost of complexity inhibits adoption and probability of support.

This is a long argument in many circles – often made more difficult by practitioners that have been working in a field for years or decades and consider the most opaque formats or concepts commonplace. Look to the OWL/RDF/SemanticWeb space for an example of how there is a mismatch between proponents and the general public.

A standard needs to have clear value to developers and users for it to even begin to be considered. No one is going to dive into a dense specification of a format without even knowing why they would want to use it or how it fits into workflows and architecture.

And complexity can also surface in small ways – inconsistant capitalization of element names (you know who you are KML), or by supporting a plethora of similar, but different flavors making it unclear which to use (GeoRSS).

Tools

In this last section of the overall problems we’re facing with GeoWeb standards, the most prevalent, and easy to address, is the lack of tools that interact and convert between these formats. Really, formats don’t matter to users – they have data from one source such as their camera, PND, blog posts, Government agency, etc. and they want to do something with it like understand what’s going on around them, find their favorite restaurant, save the rainforest, provide services, get their car fixed, or just share stories with their family.

Easy to use, engaging, and data agnostic tools are vital for adoption of any formats. Again you only have to look as far as KML’s meteoric rise from application specific format to perhaps the most ubiquitous, and growing, GeoWeb standard due to the compelling reason of “I want to see my house and things going on around the world”.

Why do none of the major RSS news readers really support GeoRSS? Every site should offer KML and Atom output of their data. Mobile devices should allow me to open in whatever mapping interface or app any of my data from any of my services.

Missing Middle Ground

GeoWeb Standards - Missing Middle Ground.jpg

Amongst the plethora of formats, we’re really missing some middle ground. Each of these formats are quite independent and unique of one another, with little cross pollination and linking occuring.

  • Why can’t my KML file link to Atom updates and also to other formats?
  • Can OpenSearch describe my tile pyramid?
  • How do I describe my path through life, media, events, places I’ve lived, worked, and people I’ve known?

We too easily get caught either in this “this format must solve all possible problems”, or “it’s good enough so why change it”. In between we need to converge to understand use cases, and how these formats and specifications can cross various barriers – connecting the experts with the amateurs, the citizens and the authorities, one with another.

GeoWeb Standards Series

  1. Introduction
  2. Where We Are
  3. Problems

GeoWeb Standards – Where we are

Published in Geo, Standards


This article is Part 2 in an ongoing series discussing the current state of GeoWeb standards.

I started in the introduction by talking about the general Web and some considerations of how geospatial data standards face unique challenges in resolving to broader data interoperability.

In evaluating the current status of standards, it’s useful to give an overview of the current standards, and brief thoughts on where they are working and need specific addressing.

I will also note that this discussion is focused on web-oriented geospatial data standards. There are many other geospatial data formats that exist – but are either too esoteric, proprietary, or not Web-aligned, to be useful in considering their application to utilization in the broad Web.

Shapefiles

Of course, with the first example I will slightly bend the statement above. Shapefiles are the bastard geodata denizens of the web. They are annoying in multiple ways. Foremost being that they are a proprietary data standard that is found entirely too common across geodata portals – especially government portals. However, there is too much information shared, and open tools that can use them, to ignore as serving a place in the GeoWeb.

Shapefiles are difficult to work with in the web. They are like portable databases, but actually consist of several files: datastore (dbf), geometries (shp). data to geometry join (shx), and optionally a projection definition (prj). Deriving from the usage as a binary data definition for desktop software storage developed by ESRI, they have historic shortcomings such as 12-character limit on attribute names, and restriction to a single geometry type (i.e. can’t mix lines, points, and polygons).

In addition with respect to web standards, they obviously have to deal with the multiple files, lack of a Mime-Type, and no web characteristics such as linking.

Microformat-Geo & Adr

Microformats are basic attempt at embedding data within generic HTML markup. The geospatial formats include simple 2-D coordinates, or an address; geo and adr respectively.

Microformats are nice because they align well within a prevalent data format and allow non-geographic expert users to easily embed information, either directly or via simple tools. Google and Yahoo both openly provide support for Microformats through improved search reliability and even some basic data manipulation tools via APIs. Other tools such as libraries and extensions also provide compelling use of Microformats with geospatial documents.

However, basic limits are that geo only allows for latitude and longitude, without any support for a height. Adr at least can provide more complete information, but neight geo nor adr allow for linking to external geometries – a common shortcoming of most the formats discussed.

Another problem with Microformats are that they don’t allow linking to context within a document. So while you can include location information in a paragraph, it is not possible to express how this location relates to the rest of an article or narrative.

So, for example, while it’s possible to markup the location of the White House, one can’t easily denote if this was the location of a press conference, or just that the U.S. President was there, or whatever else may have occured.

GeoRSS

GeoRSS arose out of the simple desire to include location in the increasingly prevalent RSS and Atom feeds from blogs and news sources. It’s another community driven and owned standard, like Microformats, that met existing needs from a bottom-up approach.

GeoRSS over the past few years has become increasingly common amongst web sites using maps and geospatial data. Google Maps, Yahoo Maps, Microsoft Bing all support exporting and importing data via GeoRSS, and major news outlets such as Reuters, and Al-Jazeera output GeoRSS.

Despite it’s widespread adoption, GeoRSS has some complexities that arose out of it’s development. There are 9 potential “flavors” of GeoRSS, although this is largely due to the 3 different formats of feeds: RDF, RSS, and Atom. There are still 3 formats of GeoRSS itself that can be utilized in any of the 3 feed formats: W3C, Simple, and GML. This causes confusion for developers, especially since W3C format is deprecated but still widely used. Perhaps this is one reason that despite GeoRSS being a simple extension to existing feed formats, there still is not GeoRSS support in any of the major news feed readers, except perhaps limited support in FriendFeed.

In addition, GeoRSS hasn’t really advanced in quite awhile despite multiple requests and discussions of extensions for multiple locations, time spans, external geometries, and feature identification.

KML

KML, or Keyhole Markup Language, became a defacto standard out of the popularization of Google Earth, formerly Keyhole Earth, and the wide creation and sharing of geographic data to use inside of this compelling 3D geobrowser.

KML offers a rich markup supporting feature locations, attributes, visual styling, 3D models, addresses, and even Atom links. In addition, it is now an OGC standard, and recently Google announced there were more than 500,000 KML files and 2 billion KML placemarks, or features (making an average of 4 placemarks per KML file).

However, KML is very clearly a direct object representation of the Google Earth application. Attribute names follow a rough camel-case convention based on parent or child classes, but sometimes this simple rule is broken in unclear ways, making it difficult for tool developers to create compliant tools. In addition, the styling capability is rudimentary with little true cascading support and no attribute or class styling capabilities.

Google continues to push forward the KML specification with vendor specific, gx:, extensions. The rest of the geospatial community has yet to attempt to influence the spec in any way despite these apparent problems.

GeoJSON

A much more recent community standard, GeoJSON merely adds geographic markup to the JSON format. This is primarily targeted to client to server communication and takes advantage of the compact size, and quick evaluation of JSON data.

GeoJSON nominally followed the GeoRSS definitions, making it easy to understand and leverage existing tools and knowledge. However, JSON itself does not provide for any actual format specification or schema definition, leaving clients to determine the layout of the JSON to agreed upon documentation rather than actual standards. This is becoming especially problematic as more services expose GeoJSON via APIs to third-party developers. It is really little more than arbitrary, unique XML without the extensive syntax.

GML

In response to the history of arbitrary, unique XML, Geography Markup Language, GML, was developed. GML follows a very strict and feature-rich mechanism for creating geographic schemas and domain specific semantics. It is used for very precise data interchange, typically over OGC services like WFS. GML is targeted to bridge the span from 1-D to 4-D geometries, multiple domains, and entirely customizable profiles, or versions, depending on a user or developer’s needs.

With GML’s power comes much complexity. Developers are typically required to devise and include their own unique schema definitions when using GML. The scope of writing a generic GML client is akin to writing a Ruby script interpreter and is daunting to general web developers that only want to include simple geographic capabilities to their general services. This complexity hampers it’s widespread adoption.

Other formats

There are a variety of other formats that are beginning to emerge on the broader web through a variety of fronts.

Spatialite is the set of spatial extensions to the open, portable SQLite format. SQLite is a file database that provides for full relational capabilities in a single file. Spatialite therefore adds geographic columns and rudimentary geospatial query support.

SQLite is already used by a variety of tools such as Google Gears for offline support and the Google Maps on the iPhone for storing tiles. We chose Spatialite for our Geocoder due to it’s compact nature and deployability. Spatialite makes for a very compelling option when you need to have access to an entire geographic database and perform operations on the data.

GeoPDF is working to become an open format. There is a pending OGC adoption of the georegistration embedding, and Adobe is pushing the ISO 32000 spec that includes how to embed vector and geographic drawing. There is still however a very fragmented ecosystem of tools and interoperability that threatens the format as a mechanism for disseminating geographic data.

CAP, Common Alerting Protocol, is a realtime focused format for sharing out alerts such as emergency news, earthquakes, or municipal signals. It is still an XML format with no real mechanism for ensuring delivery or timliness, and it is not clear the advantages over more broadly used and extended formats such as Atom.

There are still even more formats that are used in the GeoWeb such as CSV, GPX, RDF, and even OpenStreetMap (OSM). However, it is not really worth discussing these here as they are either too generic (CSV), or still too nascent (OSM) to really consider as an existing GeoWeb standard. They will, however, be discussed later in looking to the future of geodata formats.

Also, Semantic data such as Linked Data, RDF, or OWL are continuing to bubble beneath the surface. I will go in depth later on the potentials of semantic geospatial data standards.

Services

Beyond just data formats, there are a number of GeoWeb service, or interface, standards. Open Geospatial Consortium (OGC) dominates this landscape and provides various querying specifications such as Web Feature Service (WFS) and Web Map Service (WMS) in addition to other cataloguing and location-based service interfaces.

WFS and WMS both provide very full-featured capabilities, but also follow older paradigms of interfaces. Fortunately neither of them are SOAP-based, but instead rely on simple Query parameters for specifying bounding boxes, layers, formats, projections. Perhaps the biggest difficulty is that the service description is at the same endpoint as the service itself and often servers use the wrong MIME-types for the documents and errors.

More recently, general web standards organizations such as the World Wide Web Consortium (W3C) have been adopting geospatial additions for browser DOM geolocation, HTTP location information an privacy. ISO and OASIS are looking at OpenSearch-Geo for possible integration into their harvesting and cataloging standards.

OpenSearch-Geo follows the same concepts as GeoJSON and GeoRSS, providing a simple extension to a broadly adopted interface and merely adding geospatial components to it. In addition, by being only a templating specification, it can easily apply to describing general API’s such as Flickr, KML network links, or even WFS when applying the appropriate template markup.

However, while OpenSearch-Geo has garnered a lot of interest, it’s actually prevalent use isn’t clear. There are limited services that offer an explicit, compliant description of their geospatial search interfaces.

The View is Mixed

So the current state of GeoWeb data standards is quite mixed. There is no denying that they have becoming mainstream. We’re seeing some emergence, and divergence of more popular formats in Mapufacture and GeoCommons, both on upload as well as downloads or links. Google has released their figures for KML they’ve crawled on the web. By contrast, GeoCommons and Mapufacture rely on users to vet and register data sources, providing a different viewpoint into the utility of geospatial data formats on the web.

The above charts show the composition of data uploads, links, and entire composition of geodata uploaded to GeoCommons and Mapufacture. As an interesting comparison, downloads are definitely trending towards lighter weight standards: KML downloads account for 67.8% of all downloads, with CSV’s at 25.7% and Shapefiles for merely 6.3%. It is worth noting that this is merely a narrow viewpoint in the larger web – not accounting for OGC standards, raster data, and services. However it is still an enlightening consideration in looking at how people are actively engaging with the GeoWeb.

The different formats have all been used extensively, but when which format is most appropriate isn’t clear. This leads many applications to include multiple formats, an easy and appropriate solution but also one that can confuse users and provide for duplication. We’ll dive into more general problems in the next article.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Intro

Published in Geo


The utilization of geographic data and interoperability on the web has reached a maturation, as well as penetration, that requires evaluation. Location enabled mobile devices, services, and applications have crossed into mainstream use, and most search services and tools provide some means of finding or sharing information based on geographic location.

However, the state of the geospatial data and interoperability standards is quite mixed. We have both rigorous, standards body driven specifications that address formal needs, as well as community driven standards that have been emergent and lightweight by comparison. There has only been cursory cross integration between these methods, and looking forward there are still many unmet needs in current applications and new domains such as augmented reality, realtime sensors, narratives, external gazetteers, and general digital media questions of durability and archivability.

In order to consider these various aspects, of where we are and where we need to be going, I will be doing a series of articles looking at the various aspects of the Web and Geospatial data. From a general overview and then diving into considerations of utilization, complexity, size, and finally suggestions for moving into the future of geographic standards.

These notes are from my talk at GeoWeb 2009 on “GeoWeb Standards, How Far We’ve Come, How Far we need to go.” They reflect a very active time in developing standards that accommodate some of the unique aspects of geospatial data, as well as the convergence of the geospatial community with the broader data and web communities.

Part of the Web

In looking at GeoWeb standards, it’s worthwhile to consider what simple features has made the web so effective and powerful. The first sentence of the Wikipedia article on The Web states:

The World Wide Web is a system of interlinked hypertext documents accessed via the Internet.

This is a surprisingly concise and poignant definition. The key components can be summarized as: links, documents, accessibility, and open. I am slightly altering the words to give credence to the important meanings underlying such ideas as “the Internet”. The Internet is the broadly open, and universally accessible network of computer systems that allows anyone to access and publish information – a key component of the thriving, single, Web.

The GeoWeb, a term that is being utilized in order to direct ideas and conversation specifically towards geospatial concerns, is still an integral component of the general Web in the same way as the Semantic Web, Realtime Web, and Participatory Web, are also different aspects of the same entity. It is both special, and not unique.

However, it is still valuable to consider the role and mechanisms of integrating into the Web. Links and accessibility are themselves still suited and necessary for geographic information. Though this is often surprisingly missing or argued against. A common concern is that place is inherently more sensitive and therefore should be kept private and secure.

Additional unique considerations include findability, discovery, collaboration, and unification. Geographic data is both inherently sortable due to the mathematical nature of it’s construction in dimensional space, but also continuous in ways that textual or categorical information is not.

I’ve spoken before at the value of geography as a common context through which we can combine and compare disparate domains of data, but this also leads to difficulty in using web constructs to link to data such as “weather near Bermuda last week”, or “place of performance versus vendor of contracts”. Is this information shared through geographic interfaces of place, bounding boxes, or pagination of tiles?

And geography also has the benefit and difficulty of having unique place and identification. This means linking together data is more possible, such as when describing a building through business information, government zoning, weather, and user location. But it becomes more difficult when determining conflation of floors or offices within buildings, within larger regions, that change over time.

In the next article, I will do a quick survey of where we are with GeoWeb standards.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Your thoughts

Published in Conference, Geo, Standards


Later this week I’m speaking at GeoWeb about the current progress of GeoWeb standards, how far we have to go, and how to get there. We have KML and GeoRSS leading the way in searchable, linkable formats, but still a plethora of Shapefiles strewn about. There are questions of findability, semantic ontologies, durability, and expressiveness. What are the adoption rate of these formats and their utility in the future real-time, mobile, linked, open web?

What else do you think is the good and bad of GeoWeb standards?