There is clearly a movement to openly share data from numerous data sources: governments, organizations, Web sites, individuals, and devices. Users are more easily able to publish data through collaborative sites, or find and download data that they can use to remix, reapply, reuse, and extend. The trajectory of open data sharing and utilization parallels the development of open-source, where the potential magnified impact of open sharing and collaboration yields far great outcomes.
However, unlike the open source world, the legal and cultural frameworks in which to share data have not yet emerged. In code, there are a gamut of well known and widely used licenses: GPL, BSD, MIT, Apache, and more. While each has unique characteristics, their overall meaning and implications are easily understood by developers and comply with business operations that wish to use open-source software. The licenses are each unique, and can sometimes be confusing, yet there is a small vocabulary of regular licenses that developers can easily picked and choose.
In the media world, Creative Commons developed an ingenious mechanism of licenses with clear verbage and branding that makes it readily accessible by nearly anyone. With miscible license options such as "Share-Alike", "No-Derivatives", and "Non-Commercial", media producers and consumers can clearly mark appropriate uses of their works. The impact is most clearly seen on content sharing sites such as Flickr and Slideshare where users can choose from a very small list of licenses to publish information, or search for information under certain terms.
Because of these attractive features of understandability, small set, and branding, Creative Commons is increasingly utilized to openly share data. However, the Creative Commons apply only to creative works: stories, songs, photographs, and other media and as such are not truly valid when applied to databases.
Instead, the current landscape of data licenses are all completely unique, incompatible, and difficult to understand. This situation is further complicated by the existing data business ecosystem that thrives on charging large amounts of money to write, verify, and mix unique data licenses and prescribe legal uses of multiple combined data sources.
The implications are that the situation persists and groups that would like to share, or use, open data are relegated to complex, and expensive, legal counsel, or must accept risk and hope they remain within compliance, or at least outside of notice.
There are currently only two potential solutions currently in development. Creative Commons has developed CC0, ("C, C, Zero") - which essentially removes all copyright from a work or data. Flickr utilizes the CC0 license for their derived boundary datasets.
The other upcoming option is the Open Database License (ODbL), which is being put forth by the OpenStreetMap community in order to create the equivalent Creative Commons, By-Attribution, Share-Alike (CC-BY-SA) for databases.
However, even these two licenses have problems. CC0 is drastic in that it removes all copyright from the data, and so may not work with anything less than full, global release of data. Alternatively the ODbL is criticized for being "too left", where there is an unclear potential that any utilization of data such as from OpenStreetMap would subsequently have to be released. This is similar to the GPL licenses, or "viral" licenses.
What is missing is a clear set of data applicable open licenses that would allow anyone to easily demarcate the terms of the data they are releasing, and provide confidence to data consumers that they are in compliance with the data rights. The effect will be to allow data to more easily and justifiably be made available as well as tools to interact with this data. It will also address the many questions around collective, or combined, databases and derivative works, such as when deriving vector data from satellite imagery.
A few weeks ago I spoke on a panel at the OGC Summit on Spatial Law and Policy, which is one effort to build a community of developers, companies, data providers, and legal experts to address just such a need. In particular times of disaster response illuminate the immediacy of clear data sharing, which was the focus of the panel, but also more long term use of such data, and derivative works such as in rebuilding and recovery after an event.
I've discussed the pitfalls of licensing and Creative-Commons style modules before, which raised some initial questions. But in the year since then nothing has really changed towards this broader vocabulary. Will Creative Commons merely become the de-facto, if non-applicable, licensing? For example, the Ordnance Survey just released data under Creative Commons-Attribution 2.0 UK. I would be interested to hear more about other efforts that are seeking to create a simple, clear set of data licenses. In addition, how else have you dealt with confusing, and complicated data licensing issues in building new datasets, applications, or use cases?