Will an Army of Captcha-Solvers Be Unleashed on Web 2.0?


Do you love the Web 2.0 movement? Do you believe in the interactivity of the Internet? Might you run a blog that asks for comments, or do you submit comments to blogs? And in your interactive actions, do you worry about blog spamming?

Insert Nigerian Blog Spam

And we're not talking about annoying Nigeria 419 email scams, we're talking about comment spam.

The annoying comment spam that has forced bloggers to put up captcha tests (automated tests to tell computers and humans apart), to stop all the computer generated spam from overloading any open comment with spam comments.

OLPC News employs a captcha test to separate the two, humans from machines, but alas it is not perfect. Often humans overlook the tests in their haste to post and then feel slighted when their comment is lost to the junk bin and doesn't appear.

And the spam comments can be so voluminous, they overwhelm the OLPC News backend systems even if they never are actually published. Still, captchas are the best defense against bots and indispensable to any site wishing to foster conversation.

But what if it became cheap enough to pay people to submit comment spam, humans who can pass captcha tests? As Charles Arthur in the Guardian explains:

I also expect that once a few [OLPC's] have got into the hands of people aching to make a dollar, with time on their hands and an internet connection provided one way or another, we'll see a significant rise in captcha-solved spam.

But, as my spammer contact pointed out, it's nothing personal. You have to understand: it's just business.

And a big business at that. Nigerian 419 spam brings in $100 million per year from the USA alone, and blog spamming is an even easier and possibly more lucrative business. A business that Edward Hasbrouck reports has already infiltrated travel industry website comment sections:
Elias Plishner, V.P. of the interactive division of the McCann-Erickson advertising agency, boasted that, "We have an entire division in Singapore [where labor is cheaper than in the USA] devoted to seeding online forums and bulletin boards with targeted content" for our advertising clients.
A business that oddly enough, is as scared of millions of bored and poor Internet-connected children as you might be. To quote Charles Arthur in the Guardian again:
[E]arlier this year, I spoke with someone who does blog spamming for a living - a very comfortable living, he claimed. But he said that the one thing that did give him pause was the possibility that rival blog spammers might start paying people in developing countries to fill in captchas: they could always use a bit of western cash, would have the spare time and, increasingly, cheap internet connections to be able to do such tedious (but paid) work.
Feel that fear too? The horror in imagining a millions-strong army of OLPC-enabled captcha-solvers unleashed on Web 2.0, filling blog comments with spam that no automated filter could stop.

Now stop to think about solutions if human-generated comment spam starts to overwhelm. Ian Ozsvald suggests that:

We can’t employ people to act as human filters, that’ll get too expensive too quickly. Instead we’ll need to improve current A.I. techniques in the field of Image Processing and Natural Language Processing.
I respectfully disagree. I can see employing human filters as the natural combat against human comment spam. Those same cheap-labour centers, OLPC-enabled students even, can be hired to filter comments either real-time or in batches. It may not be pretty, or perfect, but it would be effective.

In fact, I may even attempt it on OLPC News using Amazon's Mechanical Turk service as an economical way to save the occasional valid site comment that falls into the always-full comment spam junk folder so we can focus more on content creation and less on comment management.

Related Entries


Hi Wayan. I quite agree that humans could be hired, sat behind OLPCs, to filter comment spam. I believe however that the problem is to do with numbers...if a human is able to craft a comment spam which is clever enough to get past the targetted spam filter then a spammer can take this specially-crafted message (or variants of it) and blast it out to hundreds (or tens) of related forums and blogs.

A human filter will only see the 1 message from the 1 site at a time - so the human is limited by the number of messages they can see, and also by the fact that they need to read most/all of the messages that need filtering. You could batch these messages (assuming the blogs/forums all subscribe to the same anti-spam+human-powered service), but if you can batch them then presumably regular statistical techniques could be used for the spam classification.

I don't think the numbers are on the side of the human filter at all!

I'm assuming here that we're talking about really clever spam messages, and that regular spams are being filtered by regular anti-spam techniques. So, I'll stand by my original argument that improvements need to be made to natural language and image processing toolkits to automate the defence against this form of attack :-)


you might be interested in this google talk where he uses human power to the good. I like when he says "captchas are creating jobs in 3rd wolrd countries!"


Solution to the "millions of kids in third world countries solving capcha challenges for pay" problem: If those kids are getting paid, then they have a source of income (oh the humanity!), so sell ad space in the capcha images, to make them want to buy iPods and WOW accounts and wallpaper paste and crap like that. If you've got millions of eyeballs, then of course you can make money off of them!

Sarcasm aside, my point is that the "problem" of paying millions of kids in third world countries to use their OLPCs to solve capcha challenges is the kind of wonderful "problem" you'd only have if you were brilliantly successful at something much more important (getting computers and money to needy kids) than the negative side-effects of the "problem" (blog spam). I'd love to live in a world that has such "problems" as millions of poor kids with computers and jobs.

Why not just pay some OLPC-enabled kids to delete your spam? Or put them to work on the other side of the problem: detecting legitimate human users?

It really seems out of proportion to complain that enabling kids with computers will increase your blog spam. Does the author of this article have some kind of anti-OLPC agenda, or is he in the employ of some organization with an anti-education agenda, like the Republican Party or Catholic Church?


How about we just turn the comment features off? Nice simple solution. Email spam is more tricky.

Come on, what's next ? Armies of OLPC-equipped kids making human DOS attacks and ssh bruteforce all over the world ?

If we turned off the comment feature, we couldn't have this very conversation on OLPC News. Open and public reader feedback and input is a key feature of Web 2.0

You might want to re-read the post. I do suggest that OLPC-equipped kids would be a great blog spam filter.

And I agree, may we all be so lucky have such "success problems" in our work. It just good to think about what those problems might be to mitigate their impact.

There is already too much spam in the world. More computers will just mean more spam. How many more Nigerian emails do you want?

Captchas are simply pointless now because there already are captcha solving shops in China. Having more of them in Africa will not change a damn thing.