Ben Brumfield

Title / Position: Partner
Organization: Brumfield Labs, LLC
Website: manuscripttranscription.blogspot.com
Twitter: benwbrum

Ben Brumfield is an a partner at Brumfield Labs LLC, a software firm specializing in the digital humanities. In 2005, he began developing one of the first web-based manuscript transcription systems. Released as the open-source tool FromThePage,it has since been used by libraries, museums, and universities to transcribe literary drafts, military diaries, scientific field notes, and punk rock fanzines. Ben has been covering crowdsourced transcription technologies on his blog since 2007.

In addition to crowdsourcing for public history, Brumfield Labs has collaborated with scholars on digital scholarly edition projects including the Digital Austin Papers and the Civil War Governors of Kentucky Digital Documentary Edition.

30Dec

Posted by
Ben Brumfield

Category
- General
Session Proposal: Crowdsourcing

4
“How many digital humanists does it take to change a lightbulb?”
“Yay, crowdsourcing!” — Melissa Terras

Crowdsourcing seems to be a favorite THATCamp session idea, appearing in at least half a dozen of the THATCamps held since 2008. Sessions I’ve participated in have developed from the basic “what is crowdsourcing” in 2009 to the more practical “how do you find and motivate volunteers” in 2011. At THATCamp AHA2012, however, we are fortunate to have campers who are experts at running crowdsourced projects, including Chris Lintott of GalaxyZoo, OldWeather and AncientLives and Jen Wolfe of the University of Iowa Civil War Diaries and Letters transcription project. Though both run popular projects, their implementation could not be more different: the Zooniverse team developed sophisticated crowdsourcing software themselves, while UIowa decided on a low-tech, partly-manual process to minimize the IT load on their team. I think that range of perspectives should lead to an interesting discussion, and hope that other campers who have experience with crowdsourcing or are just interested in the subject will join in.

Here are some questions that have been on my mind which might serve as conversation starters:
- Are some tasks inappropriate for volunteer crowdsourcing? Although it seems like people are willing to volunteer their time on the most obscure of subjects–including bugs and leaves–it still may not pay to invite volunteers to do data-entry on your institution’s old account books. Is it possible to predict in advance whether your material is suitable for the crowd?
- If a project won’t attract volunteer effort, might it still be worthwhile to use crowdsource-for-pay systems like Amazon’s Mechanical Turk or various freelancing sites? If so, how do you ensure accuracy? (One recent project introduced known bad data to transcripts before paying users to proofread and transcribed a 19th-century diary for thirty cents a page.)
- Volunteers seem to participate according to a power-law distribution in which a few users contribute the majority of the effort. (See the Transcribe Bentham leaderboard or the North American Bird Phenology Program’s top fifty transcribers chart for examples.)
  
  Is this something we should be concerned about or a phenomenon we should embrace?
  
  Do all projects demonstrate the same participation patterns? (My own small efforts have shown small-scale projects to be even more lop-sided than the large ones.)
  
  How do we find those few passionate volunteers? Where does a small project find a target-rich environment for its outreach efforts?
- Is it important to provide users with context? Christine Madsen argues that libraries and archives should stop presenting entire manuscript pages to users, as this can make their tasks feel more like work. On the other hand, Andie Thomer and Rob Guralnick believe that context is an important tool for motivating volunteers and enabling accuracy.
I hope that my fellow campers will add their own questions to these in the comments to this post.

Ben Brumfield

Session Proposal: Crowdsourcing

Proceedings of THATCamp

Nominate your favorite blog post, Twitterer, session notes, or other "thing with a URL" for inclusion in the Proceedings of THATCamp using this form.