• Session Proposal: Crowdsourcing

    “How many digital humanists does it take to change a lightbulb?”
    “Yay, crowdsourcing!”
    Melissa Terras

    Crowdsourcing seems to be a favorite THATCamp session idea, appearing in at least half a dozen of the THATCamps held since 2008.  Sessions I’ve participated in have developed from the basic “what is crowdsourcing” in 2009 to the more practical “how do you find and motivate volunteers” in 2011.  At THATCamp AHA2012, however, we are fortunate to have campers who are experts at running crowdsourced projects, including Chris Lintott of GalaxyZoo, OldWeather and AncientLives and Jen Wolfe of  the University of Iowa Civil War Diaries and Letters transcription project.  Though both run popular projects, their implementation could not be more different: the Zooniverse team developed sophisticated crowdsourcing software themselves, while  UIowa decided on a low-tech, partly-manual process to minimize the IT load on their team.  I think that range of perspectives should lead to an interesting discussion, and hope that other campers who have experience with crowdsourcing or are just interested in the subject will join in.

    Here are some questions that have been on my mind which might serve as conversation starters:

    • Are some tasks inappropriate for volunteer crowdsourcing?  Although it seems like people are willing to volunteer their time on the most obscure of subjects–including bugs and leaves–it still may not pay to invite volunteers to do data-entry on your institution’s old account books.  Is it possible to predict in advance whether your material is suitable for the crowd?
    • If a project won’t attract volunteer effort, might it still be worthwhile to use crowdsource-for-pay systems like Amazon’s Mechanical Turk or various freelancing sites?  If so, how do you ensure accuracy?  (One recent project introduced known bad data to transcripts before paying users to proofread and transcribed a 19th-century diary for thirty cents a page.)
    • Volunteers seem to participate according to a power-law distribution in which a few users contribute the majority of the effort. (See the Transcribe Bentham leaderboard or the North American Bird Phenology Program’s top fifty transcribers chart for examples.)
      • Is this something we should be concerned  about or a phenomenon we should embrace?
      • Do all projects demonstrate the same participation patterns?  (My own small efforts have shown small-scale projects to be even more lop-sided than the large ones.)
      • How do we find those few passionate volunteers?  Where does a small project find a target-rich environment for its outreach efforts?
    • Is it important to provide users with context?  Christine Madsen argues that libraries and archives should stop presenting entire manuscript pages to users, as this can make their tasks feel more like work.  On the other hand, Andie Thomer and Rob Guralnick believe that context is an important tool for motivating volunteers and enabling accuracy.

    I hope that my fellow campers will add their own questions to these in the comments to this post.

    Favorite 0 No users have favorited this post yet.

    Tags: ,


  1. Avatar of Katrina Katrina says:

    Could this session also include crowdsourced peer review? I have been intrigued with the experiments (such as with the Shakespeare journal that tried this).

    I am interested too in projects like the Trove newspaper digitisation at NLA, which does rely on crowdsourced corrections of text. (Isn’t CAPTCHA a crowdsourcing goal: even though many participants would not realise this).

  2. I don’t see why not. My perspective may be limited based on my experience with crowdsourced manuscript transcription, but I hope not to restrict the session to just that niche.

  3. Melody says:

    UIowa’s transcription project on Civil War diaries has been really inspiring to watch. I was told one transcriber went through 400 pages on her own. This demonstrates a kind of personal engagement with the content that librarians/archivists/content providers dream of. And isn’t that the more idealistic goal: connecting users to sources that enrich their life experiences?

    So the question I have is: how to go from the base need (recruiting volunteers to provide free labor) to the ideal (engaging members of the public to enrich their life experiences)?

  4. Today I interviewed the main volunteer for Southwestern University’s Zenas Matthews Mexican War diary project. In that case, the passionate volunteer arrived before the project was even announced — the head of Special Collections asked a patron who’d written about related material to take a quick peek and tell her what he thought. Within sixteen days he transcribed the entire diary and moved on to adding notes on the people and places mentioned. This has made our big public launch a bit awkward since the first pass has already been done, but I think that’s a wonderful problem to have.

    He told me “it keeps calling me back.”