Status
No public Twitter messages.
Location
Jaipur, India
Subscribe to GeoRSS Subscribe to KML


Standards

OGC Mass Market working group discussion from Darmstadt

Published in Standards


In keeping with my history of trying to shed some external perspective on how the OGC works, such as live-blogging the OWS-5 kickoff,
Geospatial search summit, and Google’s libkml anouncement – I thought it would be interesting to cover the OGC Mass Market working group telecon that’s being held in Darmstadt, Germany.

OpenSearch-Geo

GeoCommons OpenSearchPedro Goncalves presented his work taking the OpenSearch-Geo specification and forming into an OGC acceptable document. Pedro has been doing great work extending OpenSearch-Geo for accessing earth observation data. I also talked about how OGC services could be described within OpenSearch templating over a year ago in Atlanta.

Unfortunately the presentation and document is currently locked away behind OGC’s portal. Hopefully Pedro will release it publicly. In addition, it’s not clear within the OGC how to adopt such as suggested standard. It’s not part of the OGC and must go through various OGC architecture boards and discussions to be accepted potentaill as a whitepaper.

We wrote OpenSearch-Geo at WhereCamp 2007 and since then it has stayed as a draft standard with various uptake across projects. The adoption as a more formalized standard should have a very positive effect on its adoption.

GeoCommons supports 3 OpenSearch description documents, one each for Finder, Maker, and all of GeoCommons.

In the end, Pedro’s paper was accepted as a “discussion paper”. Hopefully we can push this forward at the next Technical Committee meeting in Mountain View in December – where DeWitt Clinton (OpenSearch original author) will hopefully pop in to push it forward.

The rest of the working group meeting discussed a potential GeoSMS format that ITRI from Taiwan is working on.

Unfortunately, we didn’t get to talking about the potential <geomap> HTML element ideas. There will be more discussion about that on the OGC Mass Market email list.


GeoWeb Standards – Discoverability

Published in Geo, Standards


We have a rich history of geography, cartography and GIS that is currently tucked away in top drawers, intranets, and repositories that may not stay online when we most need the data. How do we expose these huge troves of data in a way that can be utilized across various domains. The GeoWeb is all part of the same web, semantic, sensor, social, (and interplanetary). So it is vital at the GeoWeb align itself with the web and the multitude of sources and endpoints that the web is reaching into.

There are many possible solutions, and a few that are within easy grasp that we can build our tools to encompass, and develop practices that encourage utilization of these solutions while still moving forward onto better ones as the GeoWeb matures. So we’ll take a few articles to look at specific solutions.

Discoverability

Perhaps the most prevalent issue, and the one that is most easily addressable, is the findability and discovery of geodata on the web. Mano Marks reflected this same sentiment in his blog post on standards.

In thinking about discoverability, there are several primary use cases to consider: Machine crawling, Human discovery, and Tool discovery. Providing data via just a single mechanism means that it doesn’t get utilized and consumed to it’s potential and so somewhere along the chain of utilization it will be a burden to actually incorporate into a workflow.

Think of the machines

Machine crawling is the ability for any spider to walk links, find data, metadata, and formats automatically. It’s what Google, or GeoNetwork does to find and register data sources.

There was recently a discussion on auto-discovery in the GeoWeb suggesting the use of robots.txt, sitemaps, or embedded META tags in HTML pages.

Link to Data-1.jpgConsider how a spider would get to a site: it follows a link to a geospatial portal from some blog, resource, or directly entered as a good place to get data. It does a GET request on the root homepage, “/” which most likely returns the index.html equivalent. The program then parses through that for links or additional information.

If the spider knows about them, then it may ask for a sitemap.xml or robots.txt. But nothing in the original page request noted that this potentially very complete listing of data was there. This problem is the equivalent of an application having to know that it needs to ask for a GetCapabilities or other method to even discover what is available. Too much implicit knowledge of the specification is required for a program to easily discover new data and services.

What the program does see are these links that can contain information such as a link to a list of available resources. The simplest is a link to the Atom or RSS feed that can simply be a paginated list of all the resources available to the application. Within Atom, there is then the ability to link to various representations of that data in different formats. So applications are able to take the most appropriate format based on what they can consume.

Several years ago I first proposed how KML and GeoRSS could easily support one another via cross-links and with HTML documents. Atom has very nice rel and type attributes that allow for linking to all sorts of different representations. You can even link to OGC services like WMS and WFS using atom links.

Of particular interest here are looking at the currently approved list of Atom link relation types that provide basic semantics for telling you how what this link means. Is it another page? just related? It’s a limited set, but one that covers an approachable majority case for developers to begin using.

For example, mechanisms like OpenSearch, specified in a rel="search", simply notify the application that here is a service that it can query to get at additional resources. And with OpenSearch-Geo, a geoweb crawler can query information within a specific location or bounding area.

Humans need data too

Crawlers are great, they provide a say to pull together information into various other sites and tools to provide customized interfaces to users. However, within any site or tool, how should we expose geodata in a way that humans can easily use for whatever purposes the may have.

Again, links have become a very well understood concept on the Web. That underlined blue line states “beyond me lies an unspecified amount of information about this topic“. However, these links typically imply that they will open another human readable HTML page in the browser. A problem caused by links to media such as geospatial data is that the content behind a link may not be just text, it could be an image, audio, movie, KML, database, or a service. Clicking on that link relies on the browser interpreting the MIME-type (remember the point about how vital mime-types are?) and opening the application the user has specified, or left as default.

So consider what this means for generic media. Clicking open a link to an image probably just opens the image in your browser, or opening a movie loads an embedded video player. Geodata browsers, however, probably doesn’t have the same install base as say, Quicktime. Except perhaps GoogleEarth. The Web has become much more comfortable with clicking on a KML link and seeing Google Earth open up and show the data on a globe.

But something very vital often exists with a link to KML data – a recognizable icon that notifies the user (as they learn) that it is a file that will open in Google Earth, or another KML viewer. This is the same as the very widely used RSS icon.

I discussed this idea before about the geotag icon showing various other formats – and now sites like Data.gov actually show the various data format options.

Data.gov - Raw Data Catalog-1.jpg

So what we need for GeoWeb standards are some visual representation to people that they are can clink on this link and open a spatial relational database, or an OGC service, and perhaps have some confidence that there is an application that will provide them a useful way to access the data. (and I’m still waiting for Sean Gillies’ ISO and Dublin Core icons)

Of course, we should also employ emergent interfaces that show users the type of data links that are appropriate for them based on their profile or registered MIME-type handlers.

Man-Machine hybrids

So we have discovery links for machine crawlers to register and harvest geodata, and links for humans to click on to follow to data and within data. However, this can easily become overwhelming to need to click through to every link. Imagine if browsing Flickr through lynx.

Browsers already do a lot to assist users in finding relevant extra pieces of data in a page. RSS autodisovery links show up in URL bars notifying our feed readers that we can subscribe to this page. OpenSearch allows someone to embed this search into their browsers (most of them at least) to easily search the repository later.

The decreasing cost of links

These various approaches for different needs and use cases are all very well aligned. They don’t rely on additional external files that we need to make sure stay up to date or that tools are built to just know that the file can be found at a pre-defined location. Links cost next to nothing, mostly measured in bandwidth sizes, but provide a wealth of accessibility and discovery of geospatial data. Especially data in formats that make sense depending on the tools and use cases for different problems.

Of course, links alone don’t address all the needs of the evolving GeoWeb, they merely provide for the integration of geospatial data with the rest of the web. An important, necessary, but not entirely sufficient first step. We need to consider the actual uses and interfaces of these standards, archival, synchronization, conflation and more.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Where we need to go

Published in Geo, Standards


In previous articles on the status of the GeoWeb I highlighted the myriad of options and problems with current GeoWeb standards and interfaces. Overall, it’s clear that the practice of geospatial data publication and sharing in a web oriented way is still very nascent but getting better at the same time it becomes more mainstream. More data is being created and published in web-oriented ways that make it more consumable and usable.

Too often standards and tools are being by domain experts and technologists that lead to overly complex, and irrelevant formats that become a burden and introduce as many problems as they are trying to solve.

What they’re often not considering are the end user experiences. Who are the users, what are they trying to achieve, and how can these formats make for better, and easier utilization of these tools.

Granted, there are expert users. People who really want to make intricately related, projected, spatially and spectrally bounded queries into data and utilize them in advanced analytics engines. But these are not the majority and they’re not what is driving the long-term demand on the GeoWeb (you can use ‘long-tail’ here if you would like). Who are the users that want to engage with this information on a daily basis in their personal lives, businesses, family, safety, governance, and goals.

Grassroots is an option

I’m a very big fan of grassroots organization and emergent structures. The needs tend to grow from real demand, and solutions are built through actual demonstrated benefit and impact. They are agile, evolutionary, and garner broad support amongst users and developers. These are all aspects that are beneficial to achieving standards that meet the needs of end users and provide good experiences.

However, it is not the only solution. Grassroots tends to look at the immediate needs and may not incorporate more distant issues and expected needs. They seek for broad appeal, and “good enough” rather than totally encompassing all potential aspects of all interested domains. Top-down, industry derived, committee driven standards provide more directed needs and objectives that can serve different types of users.

So the solution is a hybrid – where grassroots solutions are encouraged as demonstrators and emergent needs – that are then accepted and supported by more formal organizations.

Conversation is required

But we also need to open up the conversation beyond just technologists and experts. We need to be engaging and understanding users – and not merely from the “how do I sell them more of my coffee”, but “what can I do to make their lives better”? And actually asking and engaging with them in dialogues.

This technique of user stories, and engagement is not new or unused. However it appears to be missing from the GeoWeb standards developments. We’ve been designing standards for ourselves first, and then foisting these upon others. Instead, we need to understand their needs and issues, and then apply our expert knowledge in how to approach solutions properly.

Other articles in the GeoWeb Standards series:

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Current Problems

Published in Geo, Standards


Part 3 in looking at the current state of GeoWeb Standards. See the introduction here.

It’s time to take a hard look across the board at where we’re coming up short and issues that need to be addressed. One way to summarize:

GeoRSS, KML, and GeoJSON are the itching powder, squirting ink pen, and dribble cup of geodata formats.
Sean Gillies

Sean is definitely known for his candor, and his viewpoint definitely has merit. Overall the various formats and standards fulfill various needs, but still don’t provide for all use cases, align well with best practices, or make sense to users and developers.

The simplest overall problem with many of these formats, and how they fit into the Web, is that they lack proper web-type descriptions. One primary mechanism that Web clients know how to present data is through the use of MIME-Types. MIME types provide a way for the server to notify clients that the data is in a format such as XML, Text, a PNG Image, and so on. These must be formally registered, but also ad-hoc, or vendor specific, types are commons.

In addition, MIME types allow crawlers and registries to easily record the type of the file in the metadata.

Looking over our list of various GeoWeb standards, it’s very easy to identify which formats abide by this and which don’t.

Atom, JSON, HTML, and SQLite all provide format specific MIME-Types, allowing clients to easily employ the proper applications. However, none provide a special mechanism for notifying that the data includes geospatial markup. Not necessarily a problem, geo shouldn’t be that special.

KML is perhaps the only format that has a geospatial specific MIME-type. However, despite it now being an OGC standard, the MIME Type is still the vendor specific: application/vnd.google-earth.kml+xml. However, KML was particulary ingenious in also providing for the compressed, or zipped, format as a unique MIME-type: vnd.google-earth.kmz.

GML is just XML, so that is entirely not useful in notifying a client that it should try and pass this onto a geo-enabled application. And Shapefiles are agglomeration of multiple files, and even zipped up are only marked as compressed files.

More broadly in services, the OGC has a mime type for service descriptions and responses: application/vnd.ogc.wms_xml, though errors have their own MIME-types: application/vnd.ogc.se_xml.

OpenSearch has a special MIME Type, and obviously Tiles and Image files have MIME-types.

Doesn’t matter if you can’t download it

Another major issues facing many of the GeoWeb formats is their file size. Generally, the web bounces back and forth between disregarding sizes due to assumed, ubiquitous high-speed and reliable connectivity, and trying to speed up pages. But even more important is the fact that many potential users don’t have access to high-speed internet and so their is a huge difference between 10k and 100k or 1MB of data.

To compare the sizes, I took a relatively large dataset from GeoCommons, Statistics Canada, Land and freshwater area, Canada, 2005 and exported it in a variety of formats, both uncompressed, and compressed via standard zip algorithms.

Format Size Zipped
CSV 1.3 KB
Shapefile 5.4 MB 3.6 MB
GeoRSS 3.3 MB 1.1 MB
KML 7.3 MB 2.4 MB
Spatialite 5.4 MB 3.6 MB
JSON 7.9 MB 2.3 MB

CSV just includes latitude and longitude columns of the centroid – so obviously not fully representative. An option would be to include the EWKB in a column for the full geometry – but that is far from any kind of ’standard’ that other tools would know how to intepret.

Perhaps most surprising from these results are that JSON is so large. Unfortunately, the syntax for complex geometries requires a lot of syntax that adds up in representing polygonal data.

Linkability, Durability, and Discoverability

Moving past purely file format and data type specifications brings up the issue of discoverability and linkability in GeoWeb standards. The Web is more than a list of documents that mention resources, but that they can actually link to durable endpoints that can be resolved, queried, accessed, and parsed.

Non-web native formats have no concept of linking. CSV, Shapefiles, and SQLite contain data, but no links. By contrast, Atom, GML, and KML are chock-full of links, although not always used to great effect. JSON can contain links, but without a schema, who knows what the link means.

Obviously the best model to follow here is HTML, which provides automatic links to feeds, OpenSearch description documents, pages, media, styles, and scripts.

However, what happens when a resource disappears and is no longer resolvable? How do you know where else to get another version of the same data, and is it the same data? This is becoming a big problem in the larger web, made more problematic by the use of URL shorteners, but also especially disconcerting when it affects the provenance and accuracy of geospatial data.

But without Complexity

While linkability, durability, and discoverability are vital to GeoWeb standards, the cost of complexity inhibits adoption and probability of support.

This is a long argument in many circles – often made more difficult by practitioners that have been working in a field for years or decades and consider the most opaque formats or concepts commonplace. Look to the OWL/RDF/SemanticWeb space for an example of how there is a mismatch between proponents and the general public.

A standard needs to have clear value to developers and users for it to even begin to be considered. No one is going to dive into a dense specification of a format without even knowing why they would want to use it or how it fits into workflows and architecture.

And complexity can also surface in small ways – inconsistant capitalization of element names (you know who you are KML), or by supporting a plethora of similar, but different flavors making it unclear which to use (GeoRSS).

Tools

In this last section of the overall problems we’re facing with GeoWeb standards, the most prevalent, and easy to address, is the lack of tools that interact and convert between these formats. Really, formats don’t matter to users – they have data from one source such as their camera, PND, blog posts, Government agency, etc. and they want to do something with it like understand what’s going on around them, find their favorite restaurant, save the rainforest, provide services, get their car fixed, or just share stories with their family.

Easy to use, engaging, and data agnostic tools are vital for adoption of any formats. Again you only have to look as far as KML’s meteoric rise from application specific format to perhaps the most ubiquitous, and growing, GeoWeb standard due to the compelling reason of “I want to see my house and things going on around the world”.

Why do none of the major RSS news readers really support GeoRSS? Every site should offer KML and Atom output of their data. Mobile devices should allow me to open in whatever mapping interface or app any of my data from any of my services.

Missing Middle Ground

GeoWeb Standards - Missing Middle Ground.jpg

Amongst the plethora of formats, we’re really missing some middle ground. Each of these formats are quite independent and unique of one another, with little cross pollination and linking occuring.

  • Why can’t my KML file link to Atom updates and also to other formats?
  • Can OpenSearch describe my tile pyramid?
  • How do I describe my path through life, media, events, places I’ve lived, worked, and people I’ve known?

We too easily get caught either in this “this format must solve all possible problems”, or “it’s good enough so why change it”. In between we need to converge to understand use cases, and how these formats and specifications can cross various barriers – connecting the experts with the amateurs, the citizens and the authorities, one with another.

GeoWeb Standards Series

  1. Introduction
  2. Where We Are
  3. Problems

GeoWeb Standards – Where we are

Published in Geo, Standards


This article is Part 2 in an ongoing series discussing the current state of GeoWeb standards.

I started in the introduction by talking about the general Web and some considerations of how geospatial data standards face unique challenges in resolving to broader data interoperability.

In evaluating the current status of standards, it’s useful to give an overview of the current standards, and brief thoughts on where they are working and need specific addressing.

I will also note that this discussion is focused on web-oriented geospatial data standards. There are many other geospatial data formats that exist – but are either too esoteric, proprietary, or not Web-aligned, to be useful in considering their application to utilization in the broad Web.

Shapefiles

Of course, with the first example I will slightly bend the statement above. Shapefiles are the bastard geodata denizens of the web. They are annoying in multiple ways. Foremost being that they are a proprietary data standard that is found entirely too common across geodata portals – especially government portals. However, there is too much information shared, and open tools that can use them, to ignore as serving a place in the GeoWeb.

Shapefiles are difficult to work with in the web. They are like portable databases, but actually consist of several files: datastore (dbf), geometries (shp). data to geometry join (shx), and optionally a projection definition (prj). Deriving from the usage as a binary data definition for desktop software storage developed by ESRI, they have historic shortcomings such as 12-character limit on attribute names, and restriction to a single geometry type (i.e. can’t mix lines, points, and polygons).

In addition with respect to web standards, they obviously have to deal with the multiple files, lack of a Mime-Type, and no web characteristics such as linking.

Microformat-Geo & Adr

Microformats are basic attempt at embedding data within generic HTML markup. The geospatial formats include simple 2-D coordinates, or an address; geo and adr respectively.

Microformats are nice because they align well within a prevalent data format and allow non-geographic expert users to easily embed information, either directly or via simple tools. Google and Yahoo both openly provide support for Microformats through improved search reliability and even some basic data manipulation tools via APIs. Other tools such as libraries and extensions also provide compelling use of Microformats with geospatial documents.

However, basic limits are that geo only allows for latitude and longitude, without any support for a height. Adr at least can provide more complete information, but neight geo nor adr allow for linking to external geometries – a common shortcoming of most the formats discussed.

Another problem with Microformats are that they don’t allow linking to context within a document. So while you can include location information in a paragraph, it is not possible to express how this location relates to the rest of an article or narrative.

So, for example, while it’s possible to markup the location of the White House, one can’t easily denote if this was the location of a press conference, or just that the U.S. President was there, or whatever else may have occured.

GeoRSS

GeoRSS arose out of the simple desire to include location in the increasingly prevalent RSS and Atom feeds from blogs and news sources. It’s another community driven and owned standard, like Microformats, that met existing needs from a bottom-up approach.

GeoRSS over the past few years has become increasingly common amongst web sites using maps and geospatial data. Google Maps, Yahoo Maps, Microsoft Bing all support exporting and importing data via GeoRSS, and major news outlets such as Reuters, and Al-Jazeera output GeoRSS.

Despite it’s widespread adoption, GeoRSS has some complexities that arose out of it’s development. There are 9 potential “flavors” of GeoRSS, although this is largely due to the 3 different formats of feeds: RDF, RSS, and Atom. There are still 3 formats of GeoRSS itself that can be utilized in any of the 3 feed formats: W3C, Simple, and GML. This causes confusion for developers, especially since W3C format is deprecated but still widely used. Perhaps this is one reason that despite GeoRSS being a simple extension to existing feed formats, there still is not GeoRSS support in any of the major news feed readers, except perhaps limited support in FriendFeed.

In addition, GeoRSS hasn’t really advanced in quite awhile despite multiple requests and discussions of extensions for multiple locations, time spans, external geometries, and feature identification.

KML

KML, or Keyhole Markup Language, became a defacto standard out of the popularization of Google Earth, formerly Keyhole Earth, and the wide creation and sharing of geographic data to use inside of this compelling 3D geobrowser.

KML offers a rich markup supporting feature locations, attributes, visual styling, 3D models, addresses, and even Atom links. In addition, it is now an OGC standard, and recently Google announced there were more than 500,000 KML files and 2 billion KML placemarks, or features (making an average of 4 placemarks per KML file).

However, KML is very clearly a direct object representation of the Google Earth application. Attribute names follow a rough camel-case convention based on parent or child classes, but sometimes this simple rule is broken in unclear ways, making it difficult for tool developers to create compliant tools. In addition, the styling capability is rudimentary with little true cascading support and no attribute or class styling capabilities.

Google continues to push forward the KML specification with vendor specific, gx:, extensions. The rest of the geospatial community has yet to attempt to influence the spec in any way despite these apparent problems.

GeoJSON

A much more recent community standard, GeoJSON merely adds geographic markup to the JSON format. This is primarily targeted to client to server communication and takes advantage of the compact size, and quick evaluation of JSON data.

GeoJSON nominally followed the GeoRSS definitions, making it easy to understand and leverage existing tools and knowledge. However, JSON itself does not provide for any actual format specification or schema definition, leaving clients to determine the layout of the JSON to agreed upon documentation rather than actual standards. This is becoming especially problematic as more services expose GeoJSON via APIs to third-party developers. It is really little more than arbitrary, unique XML without the extensive syntax.

GML

In response to the history of arbitrary, unique XML, Geography Markup Language, GML, was developed. GML follows a very strict and feature-rich mechanism for creating geographic schemas and domain specific semantics. It is used for very precise data interchange, typically over OGC services like WFS. GML is targeted to bridge the span from 1-D to 4-D geometries, multiple domains, and entirely customizable profiles, or versions, depending on a user or developer’s needs.

With GML’s power comes much complexity. Developers are typically required to devise and include their own unique schema definitions when using GML. The scope of writing a generic GML client is akin to writing a Ruby script interpreter and is daunting to general web developers that only want to include simple geographic capabilities to their general services. This complexity hampers it’s widespread adoption.

Other formats

There are a variety of other formats that are beginning to emerge on the broader web through a variety of fronts.

Spatialite is the set of spatial extensions to the open, portable SQLite format. SQLite is a file database that provides for full relational capabilities in a single file. Spatialite therefore adds geographic columns and rudimentary geospatial query support.

SQLite is already used by a variety of tools such as Google Gears for offline support and the Google Maps on the iPhone for storing tiles. We chose Spatialite for our Geocoder due to it’s compact nature and deployability. Spatialite makes for a very compelling option when you need to have access to an entire geographic database and perform operations on the data.

GeoPDF is working to become an open format. There is a pending OGC adoption of the georegistration embedding, and Adobe is pushing the ISO 32000 spec that includes how to embed vector and geographic drawing. There is still however a very fragmented ecosystem of tools and interoperability that threatens the format as a mechanism for disseminating geographic data.

CAP, Common Alerting Protocol, is a realtime focused format for sharing out alerts such as emergency news, earthquakes, or municipal signals. It is still an XML format with no real mechanism for ensuring delivery or timliness, and it is not clear the advantages over more broadly used and extended formats such as Atom.

There are still even more formats that are used in the GeoWeb such as CSV, GPX, RDF, and even OpenStreetMap (OSM). However, it is not really worth discussing these here as they are either too generic (CSV), or still too nascent (OSM) to really consider as an existing GeoWeb standard. They will, however, be discussed later in looking to the future of geodata formats.

Also, Semantic data such as Linked Data, RDF, or OWL are continuing to bubble beneath the surface. I will go in depth later on the potentials of semantic geospatial data standards.

Services

Beyond just data formats, there are a number of GeoWeb service, or interface, standards. Open Geospatial Consortium (OGC) dominates this landscape and provides various querying specifications such as Web Feature Service (WFS) and Web Map Service (WMS) in addition to other cataloguing and location-based service interfaces.

WFS and WMS both provide very full-featured capabilities, but also follow older paradigms of interfaces. Fortunately neither of them are SOAP-based, but instead rely on simple Query parameters for specifying bounding boxes, layers, formats, projections. Perhaps the biggest difficulty is that the service description is at the same endpoint as the service itself and often servers use the wrong MIME-types for the documents and errors.

More recently, general web standards organizations such as the World Wide Web Consortium (W3C) have been adopting geospatial additions for browser DOM geolocation, HTTP location information an privacy. ISO and OASIS are looking at OpenSearch-Geo for possible integration into their harvesting and cataloging standards.

OpenSearch-Geo follows the same concepts as GeoJSON and GeoRSS, providing a simple extension to a broadly adopted interface and merely adding geospatial components to it. In addition, by being only a templating specification, it can easily apply to describing general API’s such as Flickr, KML network links, or even WFS when applying the appropriate template markup.

However, while OpenSearch-Geo has garnered a lot of interest, it’s actually prevalent use isn’t clear. There are limited services that offer an explicit, compliant description of their geospatial search interfaces.

The View is Mixed

So the current state of GeoWeb data standards is quite mixed. There is no denying that they have becoming mainstream. We’re seeing some emergence, and divergence of more popular formats in Mapufacture and GeoCommons, both on upload as well as downloads or links. Google has released their figures for KML they’ve crawled on the web. By contrast, GeoCommons and Mapufacture rely on users to vet and register data sources, providing a different viewpoint into the utility of geospatial data formats on the web.

The above charts show the composition of data uploads, links, and entire composition of geodata uploaded to GeoCommons and Mapufacture. As an interesting comparison, downloads are definitely trending towards lighter weight standards: KML downloads account for 67.8% of all downloads, with CSV’s at 25.7% and Shapefiles for merely 6.3%. It is worth noting that this is merely a narrow viewpoint in the larger web – not accounting for OGC standards, raster data, and services. However it is still an enlightening consideration in looking at how people are actively engaging with the GeoWeb.

The different formats have all been used extensively, but when which format is most appropriate isn’t clear. This leads many applications to include multiple formats, an easy and appropriate solution but also one that can confuse users and provide for duplication. We’ll dive into more general problems in the next article.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability