status

location
Washington, DC
Subscribe to GeoRSS Subscribe to KML


Standards

Map Tiles to go

Published in Data, Standards


Back in February of this year we worked with the World Bank, USAID, and CrisisCommons to deploy a large amount of map imagery and tiles to the Haitian Government and clusters working in relief. We included a forked version of crschmidt’s haitibrowser to work offline on USB sticks.

One of the issues we encountered were the vast amount of pre-rendered tile images that needed to be moved to the device. The overall size was not that large – in the hundreds of megabytes. It was the number of files that caused issues in copying and replicated these USB sticks in order to aid in the proliferation of data.

I’ve long been an ardent supporter of SQLite and Spatialite as Open Data containers for geospatial data. It’s a portable, offline, open standard, relational data store that provides great access and compression. About a year ago we even added Spatialite support to GeoCommons – so anyone can convert data to a SQLite database.

Almost exactly three years ago, Mikel put OSM on the iPhone after realizing that Apple was using SQLite to store the tile cache for maps. It makes simple sense to put blobs of images inside a table schema for fast storage and retrieval.

Earlier this week Development Seed released a command-line toolset called MBTiles to bundle tiles into SQLite. You can get the source code here. It’s great to finally have the beginnings of a set of tools to better utilize SQLite for storing and sharing tilesets.

Chris Schmidt has shared his ideas and added broadening support to TileCache in support of storing tiles in SQLite so that anyone using TileCache can now easily load tiles offline.

I’m excited to see more adoption of easy mechanisms for interchanging data – raster and vector. We have a couple of ideas and things brewing in how to combine these tiles with other vector data as well as rendering that could really provide some good mechanisms for open spatial data stores.


OGC Mass Market working group discussion from Darmstadt

Published in Standards


In keeping with my history of trying to shed some external perspective on how the OGC works, such as live-blogging the OWS-5 kickoff,
Geospatial search summit, and Google’s libkml anouncement – I thought it would be interesting to cover the OGC Mass Market working group telecon that’s being held in Darmstadt, Germany.

OpenSearch-Geo

GeoCommons OpenSearchPedro Goncalves presented his work taking the OpenSearch-Geo specification and forming into an OGC acceptable document. Pedro has been doing great work extending OpenSearch-Geo for accessing earth observation data. I also talked about how OGC services could be described within OpenSearch templating over a year ago in Atlanta.

Unfortunately the presentation and document is currently locked away behind OGC’s portal. Hopefully Pedro will release it publicly. In addition, it’s not clear within the OGC how to adopt such as suggested standard. It’s not part of the OGC and must go through various OGC architecture boards and discussions to be accepted potentaill as a whitepaper.

We wrote OpenSearch-Geo at WhereCamp 2007 and since then it has stayed as a draft standard with various uptake across projects. The adoption as a more formalized standard should have a very positive effect on its adoption.

GeoCommons supports 3 OpenSearch description documents, one each for Finder, Maker, and all of GeoCommons.

In the end, Pedro’s paper was accepted as a “discussion paper”. Hopefully we can push this forward at the next Technical Committee meeting in Mountain View in December – where DeWitt Clinton (OpenSearch original author) will hopefully pop in to push it forward.

The rest of the working group meeting discussed a potential GeoSMS format that ITRI from Taiwan is working on.

Unfortunately, we didn’t get to talking about the potential <geomap> HTML element ideas. There will be more discussion about that on the OGC Mass Market email list.


GeoWeb Standards – Discoverability

Published in Neogeography, Standards


We have a rich history of geography, cartography and GIS that is currently tucked away in top drawers, intranets, and repositories that may not stay online when we most need the data. How do we expose these huge troves of data in a way that can be utilized across various domains. The GeoWeb is all part of the same web, semantic, sensor, social, (and interplanetary). So it is vital at the GeoWeb align itself with the web and the multitude of sources and endpoints that the web is reaching into.

There are many possible solutions, and a few that are within easy grasp that we can build our tools to encompass, and develop practices that encourage utilization of these solutions while still moving forward onto better ones as the GeoWeb matures. So we’ll take a few articles to look at specific solutions.

Discoverability

Perhaps the most prevalent issue, and the one that is most easily addressable, is the findability and discovery of geodata on the web. Mano Marks reflected this same sentiment in his blog post on standards.

In thinking about discoverability, there are several primary use cases to consider: Machine crawling, Human discovery, and Tool discovery. Providing data via just a single mechanism means that it doesn’t get utilized and consumed to it’s potential and so somewhere along the chain of utilization it will be a burden to actually incorporate into a workflow.

Think of the machines

Machine crawling is the ability for any spider to walk links, find data, metadata, and formats automatically. It’s what Google, or GeoNetwork does to find and register data sources.

There was recently a discussion on auto-discovery in the GeoWeb suggesting the use of robots.txt, sitemaps, or embedded META tags in HTML pages.

Link to Data-1.jpgConsider how a spider would get to a site: it follows a link to a geospatial portal from some blog, resource, or directly entered as a good place to get data. It does a GET request on the root homepage, “/” which most likely returns the index.html equivalent. The program then parses through that for links or additional information.

If the spider knows about them, then it may ask for a sitemap.xml or robots.txt. But nothing in the original page request noted that this potentially very complete listing of data was there. This problem is the equivalent of an application having to know that it needs to ask for a GetCapabilities or other method to even discover what is available. Too much implicit knowledge of the specification is required for a program to easily discover new data and services.

What the program does see are these links that can contain information such as a link to a list of available resources. The simplest is a link to the Atom or RSS feed that can simply be a paginated list of all the resources available to the application. Within Atom, there is then the ability to link to various representations of that data in different formats. So applications are able to take the most appropriate format based on what they can consume.

Several years ago I first proposed how KML and GeoRSS could easily support one another via cross-links and with HTML documents. Atom has very nice rel and type attributes that allow for linking to all sorts of different representations. You can even link to OGC services like WMS and WFS using atom links.

Of particular interest here are looking at the currently approved list of Atom link relation types that provide basic semantics for telling you how what this link means. Is it another page? just related? It’s a limited set, but one that covers an approachable majority case for developers to begin using.

For example, mechanisms like OpenSearch, specified in a rel="search", simply notify the application that here is a service that it can query to get at additional resources. And with OpenSearch-Geo, a geoweb crawler can query information within a specific location or bounding area.

Humans need data too

Crawlers are great, they provide a say to pull together information into various other sites and tools to provide customized interfaces to users. However, within any site or tool, how should we expose geodata in a way that humans can easily use for whatever purposes the may have.

Again, links have become a very well understood concept on the Web. That underlined blue line states “beyond me lies an unspecified amount of information about this topic“. However, these links typically imply that they will open another human readable HTML page in the browser. A problem caused by links to media such as geospatial data is that the content behind a link may not be just text, it could be an image, audio, movie, KML, database, or a service. Clicking on that link relies on the browser interpreting the MIME-type (remember the point about how vital mime-types are?) and opening the application the user has specified, or left as default.

So consider what this means for generic media. Clicking open a link to an image probably just opens the image in your browser, or opening a movie loads an embedded video player. Geodata browsers, however, probably doesn’t have the same install base as say, Quicktime. Except perhaps GoogleEarth. The Web has become much more comfortable with clicking on a KML link and seeing Google Earth open up and show the data on a globe.

But something very vital often exists with a link to KML data – a recognizable icon that notifies the user (as they learn) that it is a file that will open in Google Earth, or another KML viewer. This is the same as the very widely used RSS icon.

I discussed this idea before about the geotag icon showing various other formats – and now sites like Data.gov actually show the various data format options.

Data.gov - Raw Data Catalog-1.jpg

So what we need for GeoWeb standards are some visual representation to people that they are can clink on this link and open a spatial relational database, or an OGC service, and perhaps have some confidence that there is an application that will provide them a useful way to access the data. (and I’m still waiting for Sean Gillies’ ISO and Dublin Core icons)

Of course, we should also employ emergent interfaces that show users the type of data links that are appropriate for them based on their profile or registered MIME-type handlers.

Man-Machine hybrids

So we have discovery links for machine crawlers to register and harvest geodata, and links for humans to click on to follow to data and within data. However, this can easily become overwhelming to need to click through to every link. Imagine if browsing Flickr through lynx.

Browsers already do a lot to assist users in finding relevant extra pieces of data in a page. RSS autodisovery links show up in URL bars notifying our feed readers that we can subscribe to this page. OpenSearch allows someone to embed this search into their browsers (most of them at least) to easily search the repository later.

The decreasing cost of links

These various approaches for different needs and use cases are all very well aligned. They don’t rely on additional external files that we need to make sure stay up to date or that tools are built to just know that the file can be found at a pre-defined location. Links cost next to nothing, mostly measured in bandwidth sizes, but provide a wealth of accessibility and discovery of geospatial data. Especially data in formats that make sense depending on the tools and use cases for different problems.

Of course, links alone don’t address all the needs of the evolving GeoWeb, they merely provide for the integration of geospatial data with the rest of the web. An important, necessary, but not entirely sufficient first step. We need to consider the actual uses and interfaces of these standards, archival, synchronization, conflation and more.

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Where we need to go

Published in Neogeography, Standards


In previous articles on the status of the GeoWeb I highlighted the myriad of options and problems with current GeoWeb standards and interfaces. Overall, it’s clear that the practice of geospatial data publication and sharing in a web oriented way is still very nascent but getting better at the same time it becomes more mainstream. More data is being created and published in web-oriented ways that make it more consumable and usable.

Too often standards and tools are being by domain experts and technologists that lead to overly complex, and irrelevant formats that become a burden and introduce as many problems as they are trying to solve.

What they’re often not considering are the end user experiences. Who are the users, what are they trying to achieve, and how can these formats make for better, and easier utilization of these tools.

Granted, there are expert users. People who really want to make intricately related, projected, spatially and spectrally bounded queries into data and utilize them in advanced analytics engines. But these are not the majority and they’re not what is driving the long-term demand on the GeoWeb (you can use ‘long-tail’ here if you would like). Who are the users that want to engage with this information on a daily basis in their personal lives, businesses, family, safety, governance, and goals.

Grassroots is an option

I’m a very big fan of grassroots organization and emergent structures. The needs tend to grow from real demand, and solutions are built through actual demonstrated benefit and impact. They are agile, evolutionary, and garner broad support amongst users and developers. These are all aspects that are beneficial to achieving standards that meet the needs of end users and provide good experiences.

However, it is not the only solution. Grassroots tends to look at the immediate needs and may not incorporate more distant issues and expected needs. They seek for broad appeal, and “good enough” rather than totally encompassing all potential aspects of all interested domains. Top-down, industry derived, committee driven standards provide more directed needs and objectives that can serve different types of users.

So the solution is a hybrid – where grassroots solutions are encouraged as demonstrators and emergent needs – that are then accepted and supported by more formal organizations.

Conversation is required

But we also need to open up the conversation beyond just technologists and experts. We need to be engaging and understanding users – and not merely from the “how do I sell them more of my coffee”, but “what can I do to make their lives better”? And actually asking and engaging with them in dialogues.

This technique of user stories, and engagement is not new or unused. However it appears to be missing from the GeoWeb standards developments. We’ve been designing standards for ourselves first, and then foisting these upon others. Instead, we need to understand their needs and issues, and then apply our expert knowledge in how to approach solutions properly.

Other articles in the GeoWeb Standards series:

  1. Introduction
  2. Where We Are
  3. Problems
  4. Where We Need to Go
  5. Solutions: Discoverability

GeoWeb Standards – Current Problems

Published in Neogeography, Standards


Part 3 in looking at the current state of GeoWeb Standards. See the introduction here.

It’s time to take a hard look across the board at where we’re coming up short and issues that need to be addressed. One way to summarize:

GeoRSS, KML, and GeoJSON are the itching powder, squirting ink pen, and dribble cup of geodata formats.
Sean Gillies

Sean is definitely known for his candor, and his viewpoint definitely has merit. Overall the various formats and standards fulfill various needs, but still don’t provide for all use cases, align well with best practices, or make sense to users and developers.

The simplest overall problem with many of these formats, and how they fit into the Web, is that they lack proper web-type descriptions. One primary mechanism that Web clients know how to present data is through the use of MIME-Types. MIME types provide a way for the server to notify clients that the data is in a format such as XML, Text, a PNG Image, and so on. These must be formally registered, but also ad-hoc, or vendor specific, types are commons.

In addition, MIME types allow crawlers and registries to easily record the type of the file in the metadata.

Looking over our list of various GeoWeb standards, it’s very easy to identify which formats abide by this and which don’t.

Atom, JSON, HTML, and SQLite all provide format specific MIME-Types, allowing clients to easily employ the proper applications. However, none provide a special mechanism for notifying that the data includes geospatial markup. Not necessarily a problem, geo shouldn’t be that special.

KML is perhaps the only format that has a geospatial specific MIME-type. However, despite it now being an OGC standard, the MIME Type is still the vendor specific: application/vnd.google-earth.kml+xml. However, KML was particulary ingenious in also providing for the compressed, or zipped, format as a unique MIME-type: vnd.google-earth.kmz.

GML is just XML, so that is entirely not useful in notifying a client that it should try and pass this onto a geo-enabled application. And Shapefiles are agglomeration of multiple files, and even zipped up are only marked as compressed files.

More broadly in services, the OGC has a mime type for service descriptions and responses: application/vnd.ogc.wms_xml, though errors have their own MIME-types: application/vnd.ogc.se_xml.

OpenSearch has a special MIME Type, and obviously Tiles and Image files have MIME-types.

Doesn’t matter if you can’t download it

Another major issues facing many of the GeoWeb formats is their file size. Generally, the web bounces back and forth between disregarding sizes due to assumed, ubiquitous high-speed and reliable connectivity, and trying to speed up pages. But even more important is the fact that many potential users don’t have access to high-speed internet and so their is a huge difference between 10k and 100k or 1MB of data.

To compare the sizes, I took a relatively large dataset from GeoCommons, Statistics Canada, Land and freshwater area, Canada, 2005 and exported it in a variety of formats, both uncompressed, and compressed via standard zip algorithms.

Format Size Zipped
CSV 1.3 KB
Shapefile 5.4 MB 3.6 MB
GeoRSS 3.3 MB 1.1 MB
KML 7.3 MB 2.4 MB
Spatialite 5.4 MB 3.6 MB
JSON 7.9 MB 2.3 MB

CSV just includes latitude and longitude columns of the centroid – so obviously not fully representative. An option would be to include the EWKB in a column for the full geometry – but that is far from any kind of ‘standard’ that other tools would know how to intepret.

Perhaps most surprising from these results are that JSON is so large. Unfortunately, the syntax for complex geometries requires a lot of syntax that adds up in representing polygonal data.

Linkability, Durability, and Discoverability

Moving past purely file format and data type specifications brings up the issue of discoverability and linkability in GeoWeb standards. The Web is more than a list of documents that mention resources, but that they can actually link to durable endpoints that can be resolved, queried, accessed, and parsed.

Non-web native formats have no concept of linking. CSV, Shapefiles, and SQLite contain data, but no links. By contrast, Atom, GML, and KML are chock-full of links, although not always used to great effect. JSON can contain links, but without a schema, who knows what the link means.

Obviously the best model to follow here is HTML, which provides automatic links to feeds, OpenSearch description documents, pages, media, styles, and scripts.

However, what happens when a resource disappears and is no longer resolvable? How do you know where else to get another version of the same data, and is it the same data? This is becoming a big problem in the larger web, made more problematic by the use of URL shorteners, but also especially disconcerting when it affects the provenance and accuracy of geospatial data.

But without Complexity

While linkability, durability, and discoverability are vital to GeoWeb standards, the cost of complexity inhibits adoption and probability of support.

This is a long argument in many circles – often made more difficult by practitioners that have been working in a field for years or decades and consider the most opaque formats or concepts commonplace. Look to the OWL/RDF/SemanticWeb space for an example of how there is a mismatch between proponents and the general public.

A standard needs to have clear value to developers and users for it to even begin to be considered. No one is going to dive into a dense specification of a format without even knowing why they would want to use it or how it fits into workflows and architecture.

And complexity can also surface in small ways – inconsistant capitalization of element names (you know who you are KML), or by supporting a plethora of similar, but different flavors making it unclear which to use (GeoRSS).

Tools

In this last section of the overall problems we’re facing with GeoWeb standards, the most prevalent, and easy to address, is the lack of tools that interact and convert between these formats. Really, formats don’t matter to users – they have data from one source such as their camera, PND, blog posts, Government agency, etc. and they want to do something with it like understand what’s going on around them, find their favorite restaurant, save the rainforest, provide services, get their car fixed, or just share stories with their family.

Easy to use, engaging, and data agnostic tools are vital for adoption of any formats. Again you only have to look as far as KML’s meteoric rise from application specific format to perhaps the most ubiquitous, and growing, GeoWeb standard due to the compelling reason of “I want to see my house and things going on around the world”.

Why do none of the major RSS news readers really support GeoRSS? Every site should offer KML and Atom output of their data. Mobile devices should allow me to open in whatever mapping interface or app any of my data from any of my services.

Missing Middle Ground

GeoWeb Standards - Missing Middle Ground.jpg

Amongst the plethora of formats, we’re really missing some middle ground. Each of these formats are quite independent and unique of one another, with little cross pollination and linking occuring.

  • Why can’t my KML file link to Atom updates and also to other formats?
  • Can OpenSearch describe my tile pyramid?
  • How do I describe my path through life, media, events, places I’ve lived, worked, and people I’ve known?

We too easily get caught either in this “this format must solve all possible problems”, or “it’s good enough so why change it”. In between we need to converge to understand use cases, and how these formats and specifications can cross various barriers – connecting the experts with the amateurs, the citizens and the authorities, one with another.

GeoWeb Standards Series

  1. Introduction
  2. Where We Are
  3. Problems