“How many digital humanists does it take to change a lightbulb?”
“Yay, crowdsourcing!” — Melissa Terras
Crowdsourcing seems to be a favorite THATCamp session idea, appearing in at least half a dozen of the THATCamps held since 2008. Sessions I’ve participated in have developed from the basic “what is crowdsourcing” in 2009 to the more practical “how do you find and motivate volunteers” in 2011. At THATCamp AHA2012, however, we are fortunate to have campers who are experts at running crowdsourced projects, including Chris Lintott of GalaxyZoo, OldWeather and AncientLives and Jen Wolfe of the University of Iowa Civil War Diaries and Letters transcription project. Though both run popular projects, their implementation could not be more different: the Zooniverse team developed sophisticated crowdsourcing software themselves, while UIowa decided on a low-tech, partly-manual process to minimize the IT load on their team. I think that range of perspectives should lead to an interesting discussion, and hope that other campers who have experience with crowdsourcing or are just interested in the subject will join in.
Here are some questions that have been on my mind which might serve as conversation starters:
- Are some tasks inappropriate for volunteer crowdsourcing? Although it seems like people are willing to volunteer their time on the most obscure of subjects–including bugs and leaves–it still may not pay to invite volunteers to do data-entry on your institution’s old account books. Is it possible to predict in advance whether your material is suitable for the crowd?
- If a project won’t attract volunteer effort, might it still be worthwhile to use crowdsource-for-pay systems like Amazon’s Mechanical Turk or various freelancing sites? If so, how do you ensure accuracy? (One recent project introduced known bad data to transcripts before paying users to proofread and transcribed a 19th-century diary for thirty cents a page.)
- Volunteers seem to participate according to a power-law distribution in which a few users contribute the majority of the effort. (See the Transcribe Bentham leaderboard or the North American Bird Phenology Program’s top fifty transcribers chart for examples.)
- Is this something we should be concerned about or a phenomenon we should embrace?
- Do all projects demonstrate the same participation patterns? (My own small efforts have shown small-scale projects to be even more lop-sided than the large ones.)
- How do we find those few passionate volunteers? Where does a small project find a target-rich environment for its outreach efforts?
- Is it important to provide users with context? Christine Madsen argues that libraries and archives should stop presenting entire manuscript pages to users, as this can make their tasks feel more like work. On the other hand, Andie Thomer and Rob Guralnick believe that context is an important tool for motivating volunteers and enabling accuracy.
I hope that my fellow campers will add their own questions to these in the comments to this post.