The news, also reported by Multilingual, has stirred considerable controversy among users worldwide. A system bug, likely stemming from a bias within the linguistic model used by Instagram (Meta) for the machine translation of content posted on the platform, mistakenly added the word “terrorists” to various content related to the Palestinian people.
Several Instagram users have indeed noticed that if the English word “Palestinian” were included in a sentence containing the Palestinian flag emoji and the Arabic word “alhamdulillah” (translatable as “Praise be to God” or “thank God”), the algorithm would mistakenly translate the content by inserting the concept of “terrorist.”
A serious, extremely grave error. Capable of resulting in serious consequences for individual users and the movement they are affiliated with. It has the potential to offend an entire nation and its population.
Related:
- Machine, hybrid, or human translation
- The Translator of the Future According to The Economist
- The Fields Where the Human Translator is Still Essential
The news, initially reported by the technology magazine 404 Media, could only further inflame tensions, especially in this tragic period of history.
Meta’s response, the parent company of Instagram, was not long in coming. Through a spokesperson, it announced to The Guardian Australia the correct resolution of the issue, without, however, providing further details regarding its possible causes.
Meta and Machine Translation: Why the Term “Terrorists”?
The void created by the absence of explanations regarding the causes of the problem inevitably left ample room for personal reflections. Among these, the hypothesis of guilt towards the big tech Meta certainly could not be overlooked.
Certainly, it is not up to us to pass judgment. However, what we can hypothesize is the presence of biases within the linguistic model used to train the automatic translation system.
As reported by 404 Media, researchers Gabriel Nicholas and Aliya Bhatia from the Center for Democracy and Technology also believe that the bug is due to the training data of the model.
In summary, if the system is trained on data, often sourced online, containing biases, the model will reproduce these biases with each new occurrence unless direct intervention is made to eliminate them.
“Meta faces a dearth of available training data in languages other than English and specifically in Arabic dialects that might not be widely spoken or geopolitically strong” Batia stated. “The language model is making the connection [based on whatever is] in the available examples of either Arabic language speech or speech related to Palestinian people, and that means the output is reflective of the perspective that this text has.”
And it is precisely for this reason that, to date, the gold standard for translation services requires the indispensable presence of a human translator.
Sources: Multilingual – 404 Media – The Guardian