As per a study, major social media sites and online platforms deploying Machine learning to track hate speech are “brittle and easy to deceive”.
The study was led by researchers from Finland’s Aalto University which found that bad grammar and awkward spelling whether it may be intentional or not – might not allow artificial intelligence (AI) detectors to spot toxic social media comments.
The researchers said that modern natural language processing techniques (NLP) can help in classifying text based on individual characters, words or sentences. When such technology faces textual data which is different from that is used in their training, they become uncertain.
“We inserted typos, changed word boundaries or added neutral words to the original hate speech. Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google’s comment-ranking system Perspective,” said Tommi Grondahl, a doctoral student at the varsity.
A total of seven state-of-the-art hate speech detectors were put to test the study by the team. But the result was that all of them failed.
Google’s Perspective was also among them. It enables ranking of the comment’s “toxicity” by using text analysis methods.
Earlier to that, it was found that “Perspective” can easily be fooled by introducing simple typos.
But the team of Grondahl was able to discover that although the word “Perspective” has since become resilient to simple kind of typos, still it can easily be fooled by other modifications such as removing spaces or adding other words to it like “love” which is an innocuous one.
“I hate you” which is a sentence, did not make it to the sieve and became one as being non-hateful when it was modified into “Ihateyou love”.
In actual, hate speech is subjective and completely context-specific rendering text analysis techniques insufficient like stand-alone solutions, the researchers noted.
The recommendation by them here is to pay more attention to the quality of data sets utilized to train machine learning models – rather than refining the model design.
The results are slated to be shown at the forthcoming ACM AISec workshop in Toronto.
Hate speech spread over the social media sites have always been an issue for online tech giants. But after discovering that Artificial Intelligence could bring a ray of hope, there were steps taken in this regard by many companies. But lately, it was found that during this research by the team that it may be very easily fooled and that this is not the complete solution.
You May Also Read: Japan To Stop Illegal Fishing By Monitoring Its Waters With Help From Google And Other Companies