status

location
Washington, DC
Subscribe to GeoRSS Subscribe to KML


The need for clear data licenses

Published in Data  |  9 Comments


CreativeCommons on OSMThere is clearly a movement to openly share data from numerous data sources: governments, organizations, Web sites, individuals, and devices. Users are more easily able to publish data through collaborative sites, or find and download data that they can use to remix, reapply, reuse, and extend. The trajectory of open data sharing and utilization parallels the development of open-source, where the potential magnified impact of open sharing and collaboration yields far great outcomes.  

However, unlike the open source world, the legal and cultural frameworks in which to share data have not yet emerged. In code, there are a gamut of well known and widely used licenses: GPL, BSD, MIT, Apache, and more. While each has unique characteristics, their overall meaning and implications are easily understood by developers and comply with business operations that wish to use open-source software. The licenses are each unique, and can sometimes be confusing, yet there is a small vocabulary of regular licenses that developers can easily picked and choose.

In the media world, Creative Commons developed an ingenious mechanism of licenses with clear verbage and branding that makes it readily accessible by nearly anyone. With miscible license options such as “Share-Alike”, “No-Derivatives”, and “Non-Commercial”, media producers and consumers can clearly mark appropriate uses of their works. The impact is most clearly seen on content sharing sites such as Flickr and Slideshare where users can choose from a very small list of licenses to publish information, or search for information under certain terms.

Because of these attractive features of understandability, small set, and branding, Creative Commons is increasingly utilized to openly share data. However, the Creative Commons apply only to creative works: stories, songs, photographs, and other media and as such are not truly valid when applied to databases.

Instead, the current landscape of data licenses are all completely unique, incompatible, and difficult to understand. This situation is further complicated by the existing data business ecosystem that thrives on charging large amounts of money to write, verify, and mix unique data licenses and prescribe legal uses of multiple combined data sources.

The implications are that the situation persists and groups that would like to share, or use, open data are relegated to complex, and expensive, legal counsel, or must accept risk and hope they remain within compliance, or at least outside of notice.

There are currently only two potential solutions currently in development. Creative Commons has developed CC0, (“C, C, Zero”) – which essentially removes all copyright from a work or data. Flickr utilizes the CC0 license for their derived boundary datasets.

The other upcoming option is the Open Database License (ODbL), which is being put forth by the OpenStreetMap community in order to create the equivalent Creative Commons, By-Attribution, Share-Alike (CC-BY-SA) for databases.

However, even these two licenses have problems. CC0 is drastic in that it removes all copyright from the data, and so may not work with anything less than full, global release of data. Alternatively the ODbL is criticized for being “too left”, where there is an unclear potential that any utilization of data such as from OpenStreetMap would subsequently have to be released. This is similar to the GPL licenses, or “viral” licenses.

What is missing is a clear set of data applicable open licenses that would allow anyone to easily demarcate the terms of the data they are releasing, and provide confidence to data consumers that they are in compliance with the data rights. The effect will be to allow data to more easily and justifiably be made available as well as tools to interact with this data. It will also address the many questions around collective, or combined, databases and derivative works, such as when deriving vector data from satellite imagery.

A few weeks ago I spoke on a panel at the OGC Summit on Spatial Law and Policy, which is one effort to build a community of developers, companies, data providers, and legal experts to address just such a need. In particular times of disaster response illuminate the immediacy of clear data sharing, which was the focus of the panel, but also more long term use of such data, and derivative works such as in rebuilding and recovery after an event.

I’ve discussed the pitfalls of licensing and Creative-Commons style modules before, which raised some initial questions. But in the year since then nothing has really changed towards this broader vocabulary. Will Creative Commons merely become the de-facto, if non-applicable, licensing? For example, the Ordnance Survey just released data under Creative Commons-Attribution 2.0 UK. I would be interested to hear more about other efforts that are seeking to create a simple, clear set of data licenses. In addition, how else have you dealt with confusing, and complicated data licensing issues in building new datasets, applications, or use cases?

Similar Posts


Responses

  1. Jusitn C. Houk says:

    October 23rd, 2009 at 2:27 pm (#)

    I totally agree with you on this Andrew. From my perspective working in government, licensing is a total mess. There are a wide array of data types at different levels of sensitivity to the public. One size fits all is a very crude solution. As communities follow the moves of Portland, San Francisco, DC, and others in opening data I can see a some big challenges forming. Portland is basically just figuring out what their resolution means for everyone working in city bureaus. Having a toolbox of choices for licensing data would make the process much easier and perhaps encourage other communities to follow suite.

    JH

  2. miten sampat says:

    October 23rd, 2009 at 5:05 pm (#)

    Andrew: you have definitely highlighted a large issue here.

    in the online advertising world, similar challenges exist for data sharing, reporting, revenue attribution et al.

    from my perspective, this will continue to grow into a larger challenge with companies trying to build more and more semantic technologies. if you recall the speech given by Tim-Berners-Lee at TED in 2009, he is inviting the next set of companies and technologies to exploit semantic data and build meaning.

    to allow this next wave of innovation, the hurdles for new ideas to get access to data have to be minimized.

    would love to work on a framework, and contribute ideas

  3. Richard Fairhurst says:

    October 23rd, 2009 at 7:30 pm (#)

    You don’t actually say what your problem is with ODbL, other than some vague hand-wavy stuff.

    Data licensing is hard. When you say you want licences “that would allow anyone to easily demarcate the terms of the data they are releasing”, yeah, I agree. I also want a pony. Try actually writing that in a way that applies both to Rural vs Feist and sweat-of-the-brow jurisdictions…

  4. Jim Richardson says:

    October 27th, 2009 at 3:02 am (#)

    For arguments in favour of the CC0 or PDDL approach in the particular case of publicly funded scientific research data, see John Wilbanks’ commentary at http://scienceblogs.com/commonknowledge/2009/05/a_breakthrough_in_data_licensi.php on the “Panton Principles”, developed by Cameron Neylon, Peter Murray-Rust and Rufus Pollock (there are links in John’s posting).

    Problems with assertion of any kind of licence rights for research data include incompatibility between different licence formulations preventing data from being combined or mashed up; and the irony that restrictions to non-commercial uses may hinder retention of data in cases where eventual preservation approaches have a commercial component.

    Relying on community norms rather than a legal tack can achieve fair attribution without tying the data up. See also http://sciencecommons.org/projects/publishing/open-access-data-protocol/ which I’ll quote (section 4.1):

    “The conflict between simplicity and legal certainty can be best resolved by a twofold measure: 1) a reconstruction of the public domain and 2) the use of scientific norms to express the wishes of the data provider.

    “Reconstructing the public domain can be achieved through the use of a legal tool (waiving the relevant rights on data and asserting that the provider makes no claims on the data).

    “Requesting behavior, such as citation, through norms rather than as a legal requirement based on copyright or contracts, allows for different scientific disciplines to develop different norms for citation. This allows for legal certainty without constraining one community to the norms of another.”

  5. Jo Walsh says:

    October 27th, 2009 at 8:57 am (#)

    dear Andrew,

    It looked like much care was taken to make it clear the extent to which “Produced Works” are not subject to any “viral” style ShareAlike conditions; only when using the entire database in public in a systematic way does one become obliged to contribute improvements to the data.

    From the ODbL text at http://www.opendatacommons.org/licenses/odbl/1.0/ :
    “Notice for using output (Contents). Creating and Using a Produced Work does not require the notice in Section 4.2 [conveying the license and indicating compliance with its terms]. However, if you Publicly Use a Produced Work, You must include a notice associated with the Produced Work reasonably calculated to make any Person that uses, views, accesses, interacts with, or is otherwise exposed to the Produced Work aware that Content was obtained from the Database, Derivative Database, or the Database as part of a Collective Database, and that it is available under this License.”

    A long time was taken to hit the balance between public domain and ShareAlike advocates, many rounds of consultation with the OSM community in particular. ODbL holds up. It fits the Open Definition – http://opendefinition.org/1.0/ . Nothing is perfect, but it looks like a good candidate license.

  6. Mikel says:

    October 30th, 2009 at 8:29 am (#)

    Open Source software licenses, and Creative Commons licenses, were the result of a large and long interactive process, which required a lot of legal knowledge capacity building among communities that are not at all lawyers.

    Try just reading the GPL or Apache License and tell me if it would make sense without the cultural context and years of experience in open source software.

    CC have done a great job at making licensing accessible to non-experts, kudos.

    The OKFN is actually doing a great job of engaging the community, answering questions, clarifying confusing points. The most simple human readable view of the license is below.

    http://www.opendatacommons.org/licenses/odbl/summary/

    As you point out, CC offer one extreme for open data. ODbL offers another. As for modularization, there are compelling arguments against this, due to the differences in data. But yes, we’ll see, all part of an evolutionary process that has already made pretty great strides.

  7. Andrew Turner says:

    November 9th, 2009 at 1:29 pm (#)

    Bad me for writing a weighty post and then traveling for 3 weeks.

    Thank you for the insightful feedback. My point here was less about potential shortcomings of the ODbL. I’m aware of the very hard work that went into making it. Momentum is forward moving with the license and it will be a great shake-out of issues and open data applications.

    My goal was to raise the issue that the ODbL is a specific use-case, and one that doesn’t apply to other options for how data is opened. Organizations need various ways and mechanisms to open data.

    Open-Source licenses are varied, and wild, but at least there is a corpus of well understood licenses. So nominally users of the software know what GPL implies vs. Apache. This isn’t the case with data licenses that are unique each and every time.

    So even with ODbL, we’ll still see data being released under unique licenses if it, for example, doesn’t need to be share-alike, or can only be used “non-commercial”, etc.

  8. Mikel says:

    November 9th, 2009 at 11:31 pm (#)

    Richard Fairhurst lays out some arguments about modules in this comment ..

    http://highearthorbit.com/does-the-opendatabase-license-need-cc-style-modules/#comment-213522

    Response?

  9. Jorge says:

    November 28th, 2009 at 8:32 pm (#)

    Hello

    I Send you a link of a post about spatial data license in Infrastructure for Spatial Information in the European Community (INSPIRE)

    http://www.orbemapa.com/2008/11/inspire-y-las-licencias-de-uso-de-los.html

    I hope it´ s useful for discussion.