Status
rowing on the Ganges
Location
Varanasi, India
Subscribe to GeoRSS Subscribe to KML


Community

SwiftRiver and Crowd-Curating the Crowd-Source

Published in Community, Geo


Crowd Sourced Cholera?

Crowd-sourcing geospatial information has definitely become a common component of recent breaking news stories. Flickr, Blogs, Maps, Twitter, YouTube, et. al. are all normal channels that people are turning to in order to share, follow, and re-broadcast reports, information, tidbits, thoughts, and actions. Most prevalently in this incarnation beginning with the San Diego fires, and more recently the Mumbai attacks, and now swine flu.

For example, in the Mumbai Attacks there was an widely retweeted (rebroadcast) report that the government had asked people to stop twittering – when it hadn’t. In addition, there were numerous other questionable outcomes of the use of this crowd-sourced data.

Emergency information flows

Today with Swine Flu, we’re seeing the same issue even through georeferncing of traditional media that carries multiple editions of the same report. The result makes a single case in a High School in New York City look like an outbreak of numerous cases. There is no easy way to cluster these reports and either validate or mark them as duplicate in a way that has longevity and can feed back into the flowing stream of data.

Previously, with minimal citizen access to data streams and reporting – the information coming through these channels had a limited reach. But with increased connectivity in a crisis – either on location or via remote observation – the information is moving faster and further. The result has been an increasing concern on the potential negative impacts on reporting of invalid information, duplicate reports, or inducing panic. However, it also has the potential for incredible impact through these information patterns – we’re awash in bits of data that can inform and coallesce to provide us a full picture of the emerging scenario.

A case study of a solution

When we were working on TwitterVoteReport, this was raised as a concern with the aggregation of the twitter, mobile, media, and voice reports. The result was the implementation of a “Sweeper Interface” where volunteers could go through submitted voting conditions and mark them as “Approve”, “Deny” or “Modify”. “Approve” had the effect of bumping-up a report and validating it; “Deny” marked the report as questionable and pulled it out of the stream; and “Modify” allowed a sweeper to correct the automatically extraced metadata such as wait time, condition rating, and polling location.

The concept is simple: if you can crowd-source the information, you can also crowd-source the filter.

Retweeting is an observable measure of the willingness and desire of the crowd to disseminate and curate information. Retweeting and reblogging is an active action (mostly), but also has the effect of muddying the actual stream of information (unless a verbatim wording is used, which makes it easier to automatically cluster). This energy can be used in a way that more directly helps filter the flow of reports and news.

Crowd Sourcing Filter

Chris Blow and Kaushul Jhalla brainstormed around this in the fall after the Mumbai Attacks evaluation, the immediate and longer term potentials and called the idea “SwiftRiver“. You can read Erik Hersman’s overview of the concept and also a video from the Ushahidi technical workshop in Orlando in March.

The status of the filter

We’re utilizing the open-source VoteReport platform that was so well engineered by David Troy, and slightly generalized for InaugurationReport and we’re now building out more of the concepts on reporting, automated filtering, and human sweeping.

Swift is now sitting behind VoteReport.in, which is actually using Ushahidi as the front-end reporting system. Reports from Ushahidi, as well as aggregegated media, twitter, and other news are passed through and people will be able to help tag, and curate the reports in order to provide cultivated, important reports and information. You can check out the Swift codebase.

The future of the filter

While there is good progress in beginning to have user annotation and curation of crowd-sourced data, it still requires potentially pulling users out of the flow of the data itself. There could be more inline interfaces for having users provide this moderation within their twitter client as posts flow past – or even on a mobile device based on geolocation and on-the-ground verification of a questionable report.

And ultimately, this information needs to then get into the appropriate end-user’s workspace. That could be a first-responder, an organization, or even a person caught in the middle of a crisis that needs accurate, up-to-date information and how to act.

This will make for a very interesting discussion at the upcoming CrisisCamp.


VoteReport mapping and data feeds

Published in Community, GeoRSS, KML, OpenSearch, Project


twitter-report.pngOver the past two weeks I’ve been working with a great team of people helping to build VoteReport – an open public reporting system to be used during the 2008 US Election to track the situation as citizens cast their ballots. The simple goal is to make it easy for anyone to send in a report describing the wait time, overall rating and any complications that are impairing their ability to participate in the election. For more information check out http://twittervotereport.com.

Dave Troy has put together a solid backend that is aggregating together Twitter, SMS, voice, iPhone and Android native applications, and even YouTube. Others have built the iPhone specific applications. I’ve been working on the mapping and data sharing side of the project. The first goal was to provide a number of mechanisms to share the data that we’re gathering with everyone. Additional mashups and visualizations are free to use the data streams to pull all the data that VoteReport itself has – so definitely go wild with your ideas. A quick breakdown of what’s available:

OpenSearchhttp://votereport.us/opensearch.xml
This is the OpenSearch description document that outlines all of the feeds and various filters that you can use when getting to the data. Always check this as we’ll update it with new parameters or data streams. In addition, the various responses discussed below include OpenSearch styling pagination so you can walk through the entire database of reports without having to drink right from the firehose. This also includes the OpenSearch-Time extension.
KMLhttp://votereport.us/reports.kml
Getting the reports.kml will give a Network Link – this is useful for GoogleEarth and other KML clients to automatically update every 60 seconds with new reports. You can append live=1 to get the full KML document. I have included all the useful attributes in the ExtendedData element of all the Placemarks. Each Placemark also has an id for easy reference.
GeoRSS-Atom – http://votereport.us/reports.atom
Just want to subscribe to the feed in your RSS reader, this feed is useful for getting updates.
GeoJSONhttp://votereport.us/reports.json
JSON is super nice for doing client-side mashups and visualization. This is what the VoteReport Map itself is using. It includes a lot of information for each report, including reporter, icon, location.

All of these feeds even can take a dtstart= with an ISO-8601 date for getting reports after a certain time (and optionally dtend= for getting time-bounds of reports). A useful geographic filter is to use state= with the capitalized two-letter state code to just get reports within a state. So for example http://votereport.us/reports.atom?state=VA is a GeoRSS feed of reports in Virginia. As I mentioned, I did build a quick map that you can view at http://votereport.us/reports/map.

We’re continuing to build it out with new features as more data comes in. You can easily embed the map in your site using (and optionally remove the state=):

<iframe src="http://votereport.us/reports/map?state=VA" frameborder="0" class="stream" width="535" height="500" scrolling="no" ></iframe>

The difficulty with creating more visualizations is the lack of pre-election data. This system has been built to primarily capture a huge amount of valuable information for one day. We’re not sure before hand what this data will look like, coverage or attributes. Typically visualizations are made by exploring and playing with the data to see what emerges. In this case, we’re making estimates (and guiding via the tutorials) on what data we’d like. Therefore, the map itself has simple mechanisms for styling markers based on the user-supplied report. But the data is far to dispersed so far for something like a heatmap.

Fortunately, the team consists of a large number of public advocates that are spreading the word which should encourage more citizens to use the system and contribute both good and bad reports. Andy Carvin of NPR put together this NPR coverage, and we’ve also received coverage from Time, Huffington Post, New York Times, TechCrunch and even Craig Newmark. Check out the TVR press page for more coverage links.

And if you would like to help contribute to the project, check out the VoteReport Wiki. I imagine there will also be a number of post-election visualizations and analysis to come out of the reports.