This is a really great step. I’ve looked at the EPA data and started writing various scripts to massage it into a nicer format than their CSV. A lot of work, a lot of data, and a pain to maintain. So it’s great that institutions are exposing their data in easy to consume and view formats.
However, it also illustrates the typical ‘munging’ of styling and data that users, even professional users like the EPA, will do with KML. In the Air Emission KML they are using Altitude to display some measure of chemical emission. It isn’t clear what the number really means, and seems really bad to have a data measurement in the geometry.
And especially other clients that will consume this data. How are they to know whether the height is really the altitude of a facility, or some other representative number? In fact, you can export the facilities KML with altitude representing: NOx, Lead, Particulate Matter, Sulfur Dioxide, or Volatile Organic Compounds and there isn’t a way to tell once you’ve exported. Unfortunately KML doesn’t currently support the ability to style based on ExtendedData, other than for the Balloon text. But it would still be useful to put this data there.
For example, in Mapufacture we pull in and store the arbitrary data with an KML or RSS feed. So then a user could add, for example, all Emission facilities with a high NOx emission rate within 5 miles of their house to the map of their community.
An answer could be a link for each element to the data in a different markup such as GML, but that seems slightly convoluted and difficult for tool developers. That would still be a good solution for advanced data and metadata specifying how and when the data was collected. But the point here is to publish a base level of information in a light-weight, broadly used data format.
A couple of other small niggles with their KML: the title is just “Temporary Places” and doesn’t have a meaningful title of what the data really is, or where it came from. A title that is representative of the data, so when I leave it in my KML viewer or ingest it into another tool, then I can remember what the data is about. KML 2.2 also supports attribution and atom links that should point back to the EPA site. Also, all facility names are all caps, which is how the data was stored in their old CSV files (when I looked before).
By way of specific discussion, lets do a simple example of modifying the EPA’s current KML to a more useful KML.
<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://earth.google.com/kml/2.2"> <Folder> <name>Temporary Places</name> <open>1</open> <Document> <name>Petroleum_facilities.kml</name> <Placemark> <name>SAM HILL OIL CO:<br></br> 725 S MAIN ST BRIGHTON CO 80601-3047</name> <description><![CDATA[<p>SIC: 2911<br /> Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining </p> <p>NAICS: <br /> </p> <img src="http://www.epa.gov/cgi-bin/broker?_service=data&_program=dataprog.dw_emisplot_epad8_facility_caps.sas&year1=2002&year2=2002&debug=0&site=15121"width="460" height="360"> <hr><p><b>Petroleum Facilities Emissions (258 facilities)</b>]]></description> <styleUrl>#A</styleUrl> <Point> <extrude>1</extrude> <altitudeMode>relativeToGround</altitudeMode> <coordinates>-104.8257,39.9754,13.102642872</coordinates> </Point> </Placemark> </Document> </Folder> </kml>
So let’s clean this up a little by adding some better titles, links to more information, and putting the data in a data location instead of with geometry.
<?xml version="1.0" encoding="UTF-8"?> <kml xmlns="http://earth.google.com/kml/2.2"> <Folder> <name>EPA Air Emission Sources</name> <description>Under the Clean Air Act, EPA establishes air quality standards to protect public health, including the health of "sensitive" populations such as people with asthma, children, and older adults. EPA also sets limits to protect public welfare. This includes protecting ecosystems, including plants and animals, from harm, as well as protecting against decreased visibility and damage to crops, vegetation, and buildings.</description> <atom:link href="http://www.epa.gov/air/emissions/" /> <atom:author> <atom:name>US Environmental Protection Agency (EPA)</atom:name> </atom:author> <open>1</open> <Document> <name>Petroleum Facilities</name> <atom:link rel="self" type="application/kml+xml" href="http://www.epa.gov/mxplorer/Petroleum_facilities_US.kmz"/> <Placemark id="sic:2911"> <name>Sam Hill Oil Co</name> <description><![CDATA[ <img src="http://www.epa.gov/cgi-bin/broker?_service=data&_program=dataprog.dw_emisplot_epad8_facility_caps.sas&year1=2002&year2=2002&debug=0&site=15121"width="460" height="360">]]></description> <styleUrl>#petroleum_facility</styleUrl> <address>725 S Main St, Brighton, CO 80601-3047</address> <ExtendedData> <Data name="SIC"> <value>2911</value> </Data> <Data name="industryDescription"> <value>Petroleum Refining And Related Industries Petroleum Refining Petroleum Refining</value> </Data> <Data name="facilityType"> <name>Petroleum</name> </Data> <Data name="NOx (ppm)"> <value>13.102642872</value> </Data> </ExtendedData> <Point> <coordinates>-104.8257,39.9754</coordinates> </Point> </Placemark> </Document> </Folder> </kml>
To summarize, the EPA releasing their data in KML is a really great step to leading the way on information transparency and public awareness, however there should just a tiny bit more effort to demonstrate some better behavior in their shared data.
- KML 3 Kick-off, Module: Core
- A Proposal – GeoRSS & KML
- Geo-enabling WordPress
- KML 3 Kick-off, Module: Metadata
- VoteReport mapping and data feeds