KML - “a little less than a year”
Well, it’s official. And as reported here: “The whole process is anticipated to take less than a year.” 363 days later, that statement holds true.
Well, it’s official. And as reported here: “The whole process is anticipated to take less than a year.” 363 days later, that statement holds true.
There has been a meme floating around about the new “Geotag Icon” that was originally proposed here and now has an officious site: Geotag Icon Project
There has been a lot of dialog. Sean discusses a lot of his thoughts about semantic interoperability and formats. There has also been a number of discussions on the design itself - everything from the color, to the pushpin being indicative of points only - maybe reinforcing the “red dot fever” that plagues many maps.
These are really minor quibbles. Overall I think it’s a decent design that gives some simple meaning to what the icon conveys. However, the problem I do have is the Usage guidelines & examples. Essentially, they are saying it should be used for all geospatial formats.
Example from the site:
Whereas the Geotag Icon describes a general concept (”This item is geotagged”) the KML icon and GeoRSS favicon each proclaim a file format. This is analogous to the Feed Icon: can you imagine having a different orange icon for each web feed format? There’s no reason why the Geotag Icon can’t sit side-by-side with file format icons if that’s what folk wish to do. But a well-recognized Geotag Icon (in time!) adjacent to the text description “Download KML file (opens in Google Earth)” could well be more informative to the majority users than what is otherwise sure to be a growing set of vaguely-related file format icons with which to become familiar. The power of de facto standard icons is in instant recognition—and the fewer the merrier!
I disagree, he’s proposing this one icon should be used for a multitude of different formats that each have different capabilities and uses. It’s not like the difference between RSS and Atom, it’s the difference between HTML and RSS or CSS. Or a Video and a Photo. Sure, they’re both images, but they’re also very different in what they do.
He’s creating additional confusion by using the Geotag icon for GeoRSS. GeoRSS isn’t even a file format, it’s an extension to another file format: RSS / Atom, and they already have a recognizable icon that has meaning to users. I wouldn’t want to put yet another icon in front of them that meant something slightly different. And KML is a visualization format, similar to HTML + CSS. GPX is a very specific format that works for handheld GPS units and PND’s. I’m surprised the Geotag Icon wasn’t proposed to be used for Geo and Adr Microformats, since it matches this formula of all things geo.
This is the follie of the greater GIS community - assuming something is primarily geo first, and general information second. I’m surprised this is idea is also followed by people outside the GIS world.
So I only ask that the Usage guidelines of the Geotag icon be scaled back. It’s interesting that it’s been incorporated into Minimap Sidebar - good idea, but perhaps again confusing application with format? Using it in a photograph or video is nice because it’s clear to me that the format is a video (and I don’t care if it’s mov, fla, et al.) and useful to be alerted that it has geocoded content inside. I also think it could be useful as a link to a page of Geospatial formats. Why not even use it like the Share this on… on the Geotag project page itself?
Map this with
KML,
GeoRSS,
GPX
GPX icon is from Garmin’s Communicator Plugin. You could optionally replace the format names (like KML) with suggestion applications, but I find this a little to vendor specific. Don’t you dislike it when people say things like “I opened the Internet Explorer page…”?
I think this set of links is how I would do it in GeoPress. But don’t suggest that Geotag Icon become the over-arching marker for other formats that happen to contain geo-data. Otherwise, I’ll be suggesting a family of icons like
Timetag, and
Titletag.
At the OGC Technical Committee meeting today in St. Louis, Google pushed out the initial release of an open-source library for parsing and publishing KML. Read more about it on the Google Open Source Blog.
libkml was originally “announced” about 6 months ago as part of the kick-off of the standardization of KML within the OGC.
libkml is interesting in several ways. KML itself is just an XML specification for geographic data. Nothing really special compared to other XML formats. However, as I’ve championed there is a big difference between types of developers that use and read schemas, and those that use libraries or simple examples and documentation to implement parsers or tools. This is justified in that developers (both consumers and producers as discussed here) are usually trying to solve some other problem and want to use a format like KML merely as a mechanism to publish and visualize their information. By providing a stable and full-featured library, developers are free to build tools around the library without having to deal with the intricacies and issues of the format itself.
Similarly to the effect of opening the standardization of KML to the OGC effected other organizations like Microsoft to embrace the format - an open-source library also encourages other implementations, or competitors, of KML applications. Google is primarily in the business of data organization and search - so the more tools that publish or utilize a format they can then index is a win.
Another implication of libkml is that a single library can grow with versions and features, again freeing the developer from having to track future versions or bug fixes to the format.
Lastly, libkml is written to be fast - which is essential for handling large KML documents, realtime visualization, and potentially even mobile/limited-resource clients. However, how small libkml can be made is left to be seen.
As Michael Ashbridge pointed out, this is a very “alpha release, not Beta in the Google sense”. In fact, in the documentation there is the very clear disclaimer: “THIS IS ALHPA SOFTWARE. Expect changes. We do not yet recommend use in production code.”
There are still a number of features that are not yet implemented that are forthcoming, or can be accomplished by the broader community. They’re looking for feedback from developers on the interface and functionality. The library is C++, with SWIG bindings currently in Ruby, Java, Python, Perl and PHP. There are examples for developers to get up and running quickly.
It’s released under the new BSD license. It is meant to be as open as possible for developers to use in both open-source and closed-source projects without worrying about interference with other licenses.
It’s great to see Google pushing on the open-{source,format} in geospatial. They’ve obviously done a lot to raise public awareness of placemarking and geospatial data with GoogleMaps and GoogleEarth - they’re now engaging the GIS community and helping them.
Hopefully people, at least developers and users in the know, can soon stop referring to KML strictly as “GoogleEarth format” or “GoogleEarth Layer”.
An issue we commonly run into is the reality that there are a lot of KML and other data sources in the wild that are malformed. There is the common response “it works in GoogleMaps, why doesn’t it work elsewhere?”
libkml is able to handle, to some extext, ‘bad’ KML, but is very strict in outputting KML that is generated using the DOM API in the library. Hopefully this generally raises the quality of available KML.
A potential extension to libkml that excites me would be the ability ingest a KML document and publish it out as other formats such as GeoRSS or GML. Especially if a higher-level interface was built onto libkml that abstracted away the specifics of KML and instead provided an interface for general geometry (and feature) creation and manipulation.
Unfortunately since my laptop hard drive died last week, I don’t have a development machine to build and play with this yet. But I expect to use this library in a number of projects.
A commonly requested feature addition to GeoRSS has been multiple locations per entry. Currently, GeoRSS only adds a single geometry per RSS or Atom entry. This was pragmatic and served the general goals of GeoRSS.
There are several commonly encountered use cases. News reports typically mention several locations. Bloggers using GeoPress may tell a story about a trip and want to reference several spots along their trip - especially if they are documenting a tour that includes a path and sites along that path such as in EveryTrail. Dan Schultz talks about why “One Location Doesn’t Cut It”, citing other examples from news journalism.
Adrian and I recently sat down together to quickly brainstorm on what this may look like. The features we were looking to add were: multiple geometries, excerpt for that geometry, and toponym for that location (venue, city, etc.) Additionally, we didn’t want to break current compatibility.
Other services are already including multiple locations in different ways. Flickr outputs a single location in two different formats of GeoRSS: Simple, and some odd form of deprecated W3C. MetaCarta’s RSS-to-GeoRSS converted currently just dumps multiple locations into the entry, but without identifying if these are unique locations, or just variations in format type or hierarchy.
We wanted to call out that this is in fact a different type of geometry - a multi-geometry. Both KML and WKT support multi-geometry, but without being able to reference what the points are individually about. That’s useful if you are, say, marking all the holes in a field, but not for narratives.
Another feature we wanted to try and support was to be able to reference geometries stored elsewhere. Currently in GeoRSS feeds you’ll typically see references to a City or Country just include a point to the center of that geography. Not really indicative of what the article was about, or useful when trying to find all the geographic data about an area. So it’s important to include lines and areas as appropriate. However, including huge outlines of states or nations, potentially multiple times within a single feed, can have drastically bad consequences of increasing feed file size and complexity.
Here is a snippet of what we are proposing:
<description>
We went to visit downtown Cedarburg before
the conference. Had some great sandwiches at Joe's.
If you haven't been to Cedarburg, Wisconsin, then
you haven't really experienced the MidWest...
</description>
<georss:collection>
<georss:point
excerpt="Went to visit downtown Cedarburg..."
featurename="Downtown Cedarburg, Wis.">
43.296700 -87.987500
</georss:point>
<georss:polygon
rel="geometry"
src="http://geonames.org/geometries/5867680"
excerpt="..."
featurename="Cedarburg, Wisconsin"
type="application/vnd.google-earth.kml+xml"/>
<georss:line
featurename="Convention Center">
43.296700 -87.987500 43.3 -88 -44, -89
</georss:line>
</georss:collection>
The first part to notice is that we wrapped the multiple geometries in a georss:collection. This allows current parsers to not be confused by encountering multiple georss elements unwrapped and being unclear if they are multiple representations of the same geometry, or different geometries.
We also included a excerpt attribute that allows you to include some text referencing what this location is specifically about. This can be text from the article itself, or some other useful information. One concept we had considered was using some reference to the text wrapped in the article itself, but this seemed burdensome and prone to problems using an attribute of one element to embedded text in another element.
The second element is a georss:polygon that includes a src reference to the geometry stored elsewhere. The rel tag specifies that it is the geometry of this element, and the type helps the tool know what the representation is of the stored geometry. This way a tool that is consuming the GeoRSS can go and fetch the geometry if it wants, or if it already has a cached version, say referenced elsewhere in this same feed, then it doesn’t have to request it again.
Of course, with a standards development, it is useful to consider how a user interface might provide for including multiple locations in an entry. Here is a mockup of how I imagine a simple interface would appear, and probably how we’d implement it in something like GeoPress:
Article: We went to visit downtown Cedarburg before the conference. Had some great sandwiches at Joe’s. If you haven’t been to Cedarburg, Wisconsin, then you haven’t really experienced the MidWest…
Locations:
- Excerpt: Went to visit downtown Cedarburg…
- Type: Point
- Geometry: 43.296700 -87.987500
- Name: Cedarburg, Wis.
To promote ideas and discussion around these and other proposals, I’ve created proposals at GeoRSS.org on multiple location and referencing external geometry. Please let us know what you think about the idea and format. We know that we can’t please everyone, but like the origins of GeoRSS, we’re just trying to address a real need with a simple format.
The Google LatLon blog points to the EPA’s release of chemical emission facilities in KML.
This is a really great step. I’ve looked at the EPA data and started writing various scripts to massage it into a nicer format than their CSV. A lot of work, a lot of data, and a pain to maintain. So it’s great that institutions are exposing their data in easy to consume and view formats.
However, it also illustrates the typical ‘munging’ of styling and data that users, even professional users like the EPA, will do with KML. In the Air Emission KML they are using Altitude to display some measure of chemical emission. It isn’t clear what the number really means, and seems really bad to have a data measurement in the geometry.
And especially other clients that will consume this data. How are they to know whether the height is really the altitude of a facility, or some other representative number? In fact, you can export the facilities KML with altitude representing: NOx, Lead, Particulate Matter, Sulfur Dioxide, or Volatile Organic Compounds and there isn’t a way to tell once you’ve exported. Unfortunately KML doesn’t currently support the ability to style based on ExtendedData, other than for the Balloon text. But it would still be useful to put this data there.
For example, in Mapufacture we pull in and store the arbitrary data with an KML or RSS feed. So then a user could add, for example, all Emission facilities with a high NOx emission rate within 5 miles of their house to the map of their community.
An answer could be a link for each element to the data in a different markup such as GML, but that seems slightly convoluted and difficult for tool developers. That would still be a good solution for advanced data and metadata specifying how and when the data was collected. But the point here is to publish a base level of information in a light-weight, broadly used data format.
A couple of other small niggles with their KML: the title is just “Temporary Places” and doesn’t have a meaningful title of what the data really is, or where it came from. A title that is representative of the data, so when I leave it in my KML viewer or ingest it into another tool, then I can remember what the data is about. KML 2.2 also supports attribution and atom links that should point back to the EPA site. Also, all facility names are all caps, which is how the data was stored in their old CSV files (when I looked before).
By way of specific discussion, lets do a simple example of modifying the EPA’s current KML to a more useful KML.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Folder>
<name>Temporary Places</name>
<open>1</open>
<Document>
<name>Petroleum_facilities.kml</name>
<Placemark>
<name>SAM HILL OIL CO:<br></br> 725 S MAIN ST BRIGHTON CO 80601-3047</name>
<description><![CDATA[<p>SIC: 2911<br />
Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining </p>
<p>NAICS: <br />
</p>
<img src="http://www.epa.gov/cgi-bin/broker?_service=data&_program=dataprog.dw_emisplot_epad8_facility_caps.sas&year1=2002&year2=2002&debug=0&site=15121"width="460" height="360">
<hr><p><b>Petroleum Facilities Emissions (258 facilities)</b>]]></description>
<styleUrl>#A</styleUrl>
<Point>
<extrude>1</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<coordinates>-104.8257,39.9754,13.102642872</coordinates>
</Point>
</Placemark>
</Document>
</Folder>
</kml>
So let’s clean this up a little by adding some better titles, links to more information, and putting the data in a data location instead of with geometry.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Folder>
<name>EPA Air Emission Sources</name>
<description>Under the Clean Air Act, EPA establishes air quality standards to protect public health, including the health of "sensitive" populations such as people with asthma, children, and older adults. EPA also sets limits to protect public welfare. This includes protecting ecosystems, including plants and animals, from harm, as well as protecting against decreased visibility and damage to crops, vegetation, and buildings.</description>
<atom:link href="http://www.epa.gov/air/emissions/" />
<atom:author>
<atom:name>US Environmental Protection Agency (EPA)</atom:name>
</atom:author>
<open>1</open>
<Document>
<name>Petroleum Facilities</name>
<atom:link rel="self" type="application/kml+xml" href="http://www.epa.gov/mxplorer/Petroleum_facilities_US.kmz"/>
<Placemark id="sic:2911">
<name>Sam Hill Oil Co</name>
<description><![CDATA[
<img src="http://www.epa.gov/cgi-bin/broker?_service=data&_program=dataprog.dw_emisplot_epad8_facility_caps.sas&year1=2002&year2=2002&debug=0&site=15121"width="460" height="360">]]></description>
<styleUrl>#petroleum_facility</styleUrl>
<address>725 S Main St, Brighton, CO 80601-3047</address>
<ExtendedData>
<Data name="SIC">
<value>2911</value>
</Data>
<Data name="industryDescription">
<value>Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining</value>
</Data>
<Data name="facilityType">
<name>Petroleum</name>
</Data>
<Data name="NOx (ppm)">
<value>13.102642872</value>
</Data>
</ExtendedData>
<Point>
<coordinates>-104.8257,39.9754</coordinates>
</Point>
</Placemark>
</Document>
</Folder>
</kml>
To summarize, the EPA releasing their data in KML is a really great step to leading the way on information transparency and public awareness, however there should just a tiny bit more effort to demonstrate some better behavior in their shared data.