When good data goes bad

Very cold dayMichigan has some very odd weather - snow one day, a balmy 70 deg F the next, then more snow. This is in April too, mind you. However, what I didn't expect, was next Monday's weather report, a rather frigid -9998 deg F. Also note, that's the High temperature. When that is the high, I definitely imagine the low is Not Applicable, because at that point you probably don't care.

But this illuminates the point that you can't trust data, especially free data. There are some great Web API's (see Programmable Web for a pretty comprehensive listing), which allow for mashup's, data sharing, and the ability to pull together great sets of data for all kinds of application. There have been a variety of recent discussion on who is responsible and holds the ultimate key to all this data and what this means to services that prop their business up on the availability, and most importantly, the accuracy of this data.

Imagine I have my house automation system to control the heating system based on the upcoming weather, or send warnings or other responses based on predicted weather patterns. A decent system may catch a complete outlier which is off any known physical chart. However, what if the weather was just off by 10 degrees, or given in the wrong units. When a human is in the loop, they may or may not catch this. But we also don't base large system responses on just one person saying "oh yeah, it'll be cold". However, when automated systems are fed data, we only have our foresight in designing the system to account for data integrity and accuracy and how best to evaluate these.

Perhaps one answer is providing multiple data sources, all of which help correlate, or catch errors in, other data sets. However, we can't be sure these data sets are truly independent of one another, and aren't just all mashups of the same data. This has been made very apparent in the rise of navigation systems and incorrect location data of business, streets, exits, etc. There are really only 2 major players in the world for navigational data: NavTeq and TeleAtlas. Therefore, Google, Yahoo!, MSN, MapQuest, Toyota/Lexus Nav systems, et al. are all just variable mashups on these same, limited, data sets (look closely at those credits on the bottom of the maps). Yet users will still "double-check" against various providers to "verify" an answer, even though the answers all came from the sam source. (which ultimately was a couple of workers riding around in cars talking into microphones and laptops, talking about what they see from their car)

So if you hear Michigan actually freezes over next week - then you can feel confident about your data. And just think, the UV index isn't that bad.

About this article

written on
posted in DetroitObservationWeb Back to Top

About the Author

Andrew Turner is an advocate of open standards and open data. He is actively involved in many organizations developing and supporting open standards, including OpenStreetMap, Open Geospatial Consortium, Open Web Foundation, OSGeo, and the World Wide Web Consortium. He co-founded CrisisCommons, a community of volunteers that, in coordination with government agencies and disaster response groups, build technology tools to help people in need during and after a crisis such as an earthquake, tsunami, tornado, hurricane, flood, or wildfire.