EPA KML a great step, but both forward and backward

The Google LatLon blog points to the EPA's release of chemical emission facilities in KML.

This is a really great step. I've looked at the EPA data and started writing various scripts to massage it into a nicer format than their CSV. A lot of work, a lot of data, and a pain to maintain. So it's great that institutions are exposing their data in easy to consume and view formats.

However, it also illustrates the typical 'munging' of styling and data that users, even professional users like the EPA, will do with KML. In the Air Emission KML they are using Altitude to display some measure of chemical emission. It isn't clear what the number really means, and seems really bad to have a data measurement in the geometry.

And especially other clients that will consume this data. How are they to know whether the height is really the altitude of a facility, or some other representative number? In fact, you can export the facilities KML with altitude representing: NOx, Lead, Particulate Matter, Sulfur Dioxide, or Volatile Organic Compounds and there isn't a way to tell once you've exported. Unfortunately KML doesn't currently support the ability to style based on ExtendedData, other than for the Balloon text. But it would still be useful to put this data there.

For example, in Mapufacture we pull in and store the arbitrary data with an KML or RSS feed. So then a user could add, for example, all Emission facilities with a high NOx emission rate within 5 miles of their house to the map of their community.

An answer could be a link for each element to the data in a different markup such as GML, but that seems slightly convoluted and difficult for tool developers. That would still be a good solution for advanced data and metadata specifying how and when the data was collected. But the point here is to publish a base level of information in a light-weight, broadly used data format.

A couple of other small niggles with their KML: the title is just "Temporary Places" and doesn't have a meaningful title of what the data really is, or where it came from. A title that is representative of the data, so when I leave it in my KML viewer or ingest it into another tool, then I can remember what the data is about. KML 2.2 also supports attribution and atom links that should point back to the EPA site. Also, all facility names are all caps, which is how the data was stored in their old CSV files (when I looked before).

By Example


By way of specific discussion, lets do a simple example of modifying the EPA's current KML to a more useful KML.




Temporary Places
1

Petroleum_facilities.kml

SAM HILL OIL CO:<br></br> 725 S MAIN ST BRIGHTON CO 80601-3047
SIC: 2911

Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining

NAICS:



Petroleum Facilities Emissions (258 facilities)]]>

#A

1
relativeToGround
-104.8257,39.9754,13.102642872





So let's clean this up a little by adding some better titles, links to more information, and putting the data in a data location instead of with geometry.




EPA Air Emission Sources
Under the Clean Air Act, EPA establishes air quality standards to protect public health, including the health of "sensitive" populations such as people with asthma, children, and older adults. EPA also sets limits to protect public welfare. This includes protecting ecosystems, including plants and animals, from harm, as well as protecting against decreased visibility and damage to crops, vegetation, and buildings.


US Environmental Protection Agency (EPA)

1

Petroleum Facilities


Sam Hill Oil Co
]]>

#petroleum_facility

725 S Main St, Brighton, CO 80601-3047



2911


Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining


Petroleum


13.102642872



-104.8257,39.9754





To summarize, the EPA releasing their data in KML is a really great step to leading the way on information transparency and public awareness, however there should just a tiny bit more effort to demonstrate some better behavior in their shared data.

About this article

written on
posted in KML Back to Top

About the Author

Andrew Turner is an advocate of open standards and open data. He is actively involved in many organizations developing and supporting open standards, including OpenStreetMap, Open Geospatial Consortium, Open Web Foundation, OSGeo, and the World Wide Web Consortium. He co-founded CrisisCommons, a community of volunteers that, in coordination with government agencies and disaster response groups, build technology tools to help people in need during and after a crisis such as an earthquake, tsunami, tornado, hurricane, flood, or wildfire.