follow us in feedly
Google’s new toxic-language algorithm is surprisingly TERRIBLE at detecting toxic language
12:15 pm


hate speech

There’s little doubt that AI and robots pose a very interesting challenge to the assumptions human beings have about work, utility, wages, and productivity. “Labor-saving” was always a positive adjective, but lately it seems more like a threat. In certain quarters of Silicon Valley, however, the war’s already been lost, the rise of the robots is inevitable and there’s nothing we can do about it.

Every now and then, though, you run into an example of how hard it is for machines to mimic the kinds of intellectual work we do with hardly a thought. A recent example is Perspective, the new machine-learning service that Alphabet (as Google is now called) released yesterday. The purpose of Perspective is to use machine learning to identify hateful or trollish content on message boards as a way of enhancing the quality of online discourse. In the wake of the 2016 elections, in which armies of anti-Clinton trolls paid by the Russian government almost certainly had a significant impact on the outcome, the question of how to improve social media is a pressing one indeed, and I wish Alphabet all the luck in the world in achieving that objective.

However, Perspective’s got a ways to go, and some of the errors the program has been shown to make are enough to cause one to question if machines will ever be able to parse meaning-laden human expression with any accuracy. Bottom line: humans, in their ability to evade detection for nasty invective, are way, way ahead of the machines.

A report by David Auerbach in the MIT Technology Review offers plenty of vivid examples. Perspective gives comments a rating from 1 to 100 on “toxicity,” which is defined as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.” For certain kinds of basic statements, Perspective does fairly well. The program understands “Screw you, Trump supporters” to be highly toxic, but “I honestly support both” is not. So far, so good.

But the algorithm is a bit too dependent on hot-button keywords, and not enough on the surrounding contextual clues in the statement, especially a word like “not,” which tends to reverse the polarity of what’s being said. “Rape,” “Jews,” “terrorist,” and “Hitler” are all likely to increase your toxicity score, even in comments that are mostly placating or unobjectionable.

Auerbach supplies a hilarious account of the ways Perspective gets it wrong: 

“Trump sucks” scored a colossal 96 percent, yet neo-Nazi codeword “14/88” only scored 5 percent. “Few Muslims are a terrorist threat” was 79 percent toxic, while “race war now” scored 24 percent. “Hitler was an anti-Semite” scored 70 percent, but “Hitler was not an anti-Semite” scored only 53%, and “The Holocaust never happened” scored only 21%. And while “gas the joos” scored 29 percent, rephrasing it to “Please gas the joos. Thank you.” lowered the score to a mere 7 percent. (“Jews are human,” however, scores 72 percent. “Jews are not human”? 64 percent.)

Humans are highly subtle when it comes to language, and machines find it hard to keep up. A particularly chilling example from the MIT Technology Review article is the sentence “You should be made into a lamp,” which is a direct allusion to Nazi atrocities and has been directed at several journalists in recent months. Perspective gives that a toxicity rating of 4.

It’s hard enough to parse language for hateful intent; imagine how much harder when you toss in a factor like juxtaposition with an image. A sentence like “You can trust me to do the right thing” has a completely different meaning when placed next to a picture of Pepe the Frog, wouldn’t you think?

Posted by Martin Schneider | Leave a comment