SwiftRiver and Crowd-Curating the Crowd-Source

Crowd Sourced Cholera?

Crowd-sourcing geospatial information has definitely become a common component of recent breaking news stories. Flickr, Blogs, Maps, Twitter, YouTube, et. al. are all normal channels that people are turning to in order to share, follow, and re-broadcast reports, information, tidbits, thoughts, and actions. Most prevalently in this incarnation beginning with the San Diego fires, and more recently the Mumbai attacks, and now swine flu.

For example, in the Mumbai Attacks there was an widely retweeted (rebroadcast) report that the government had asked people to stop twittering - when it hadn't. In addition, there were numerous other questionable outcomes of the use of this crowd-sourced data.

Emergency information flows

Today with Swine Flu, we're seeing the same issue even through georeferncing of traditional media that carries multiple editions of the same report. The result makes a single case in a High School in New York City look like an outbreak of numerous cases. There is no easy way to cluster these reports and either validate or mark them as duplicate in a way that has longevity and can feed back into the flowing stream of data.

Previously, with minimal citizen access to data streams and reporting - the information coming through these channels had a limited reach. But with increased connectivity in a crisis - either on location or via remote observation - the information is moving faster and further. The result has been an increasing concern on the potential negative impacts on reporting of invalid information, duplicate reports, or inducing panic. However, it also has the potential for incredible impact through these information patterns - we're awash in bits of data that can inform and coallesce to provide us a full picture of the emerging scenario.

A case study of a solution

When we were working on TwitterVoteReport, this was raised as a concern with the aggregation of the twitter, mobile, media, and voice reports. The result was the implementation of a "Sweeper Interface" where volunteers could go through submitted voting conditions and mark them as "Approve", "Deny" or "Modify". "Approve" had the effect of bumping-up a report and validating it; "Deny" marked the report as questionable and pulled it out of the stream; and "Modify" allowed a sweeper to correct the automatically extraced metadata such as wait time, condition rating, and polling location.

The concept is simple: if you can crowd-source the information, you can also crowd-source the filter.

Retweeting is an observable measure of the willingness and desire of the crowd to disseminate and curate information. Retweeting and reblogging is an active action (mostly), but also has the effect of muddying the actual stream of information (unless a verbatim wording is used, which makes it easier to automatically cluster). This energy can be used in a way that more directly helps filter the flow of reports and news.

Crowd Sourcing Filter

Chris Blow and Kaushul Jhalla brainstormed around this in the fall after the Mumbai Attacks evaluation, the immediate and longer term potentials and called the idea "SwiftRiver". You can read Erik Hersman's overview of the concept and also a video from the Ushahidi technical workshop in Orlando in March.

The status of the filter

We're utilizing the open-source VoteReport platform that was so well engineered by David Troy, and slightly generalized for InaugurationReport and we're now building out more of the concepts on reporting, automated filtering, and human sweeping.

Swift is now sitting behind VoteReport.in, which is actually using Ushahidi as the front-end reporting system. Reports from Ushahidi, as well as aggregegated media, twitter, and other news are passed through and people will be able to help tag, and curate the reports in order to provide cultivated, important reports and information. You can check out the Swift codebase.

The future of the filter

While there is good progress in beginning to have user annotation and curation of crowd-sourced data, it still requires potentially pulling users out of the flow of the data itself. There could be more inline interfaces for having users provide this moderation within their twitter client as posts flow past - or even on a mobile device based on geolocation and on-the-ground verification of a questionable report.

And ultimately, this information needs to then get into the appropriate end-user's workspace. That could be a first-responder, an organization, or even a person caught in the middle of a crisis that needs accurate, up-to-date information and how to act.

This will make for a very interesting discussion at the upcoming CrisisCamp.

About this article

written on
posted in NeogeographyCommunity Back to Top

About the Author

Andrew Turner is an advocate of open standards and open data. He is actively involved in many organizations developing and supporting open standards, including OpenStreetMap, Open Geospatial Consortium, Open Web Foundation, OSGeo, and the World Wide Web Consortium. He co-founded CrisisCommons, a community of volunteers that, in coordination with government agencies and disaster response groups, build technology tools to help people in need during and after a crisis such as an earthquake, tsunami, tornado, hurricane, flood, or wildfire.