Machine or automated translation: how does it work?

From its beginning, there were many jokes around the translation results users would get from Google Translate usage. Some very funny, some others very odd, some were even dangerous, since they could lead anyone to an awkward, even disastrous situation. Yet, Machine Translation, made its way through time to the hearts of many, including the translators who, in the past, wouldn’t agree with its application. But machine or automated translation: how does it work? What is this, exactly? How much has it changed during the years?

A definition of Machine Translation is, as stated on Memsource’s blog, an “automated translation by computer software”. Not to be confused though with the acronym “CAT”, which means “Computer Assisted Translation”, a tool that helps translators doing their tasks, dividing the text into chunks, making it more manageable. When it comes to Machine Translation, the process is simple but goes farther: a computer software takes the original text — better called Source —, divides it into chunks — better called Segments —, and finds and replaces these with words, phrases, in the language to be translated — better called Target. Machine Translation can deliver very impressive results, sometimes, with no Human interaction at all. Even though, a user of MT must know very well the source and the target language, besides the subject, according to the case, in order to guarantee a text that, at least, makes sense.

Machine Translation: how does it work?

Many don’t know, but Machine Translation does not date back to late 90’s, or first 2000’s. There is a very long story behind it. The first machines for the Automated Translation in the 1950’s were more like machines than computers, often relying on punch cards. Then, its technological development never stopped, going throughout the years from cards, to a simple substitution of words, a 1-to-1 approach, to what we have today. The past ten years represent the major transformation for the Machine Translation, with the rise of Neural MT.

Nowadays, talking about Machine Translation, there are three different approaches. The first one is “Rule-based”: it uses grammar and language rules, besides glossaries, that can be customized to a specific subject. It looks like this:

Image Source:

A translation done using this methodology takes into consideration, for instance, the position of the verb in the phrase, the use of plural, subject-verb agreement, etc. The second is “Statistical”: in this case, the machine “learns” how to translate, not according to language rules, but following a comparative analysis of a large amount of existing human translations.

Image source:

Neural Machine Translation: the wider the amount, the better

The third is the “Neural” type. It looks like as follows:

Image source:

By using a large neural network, the Neural Machine Translation “teaches” itself on how to deliver a translation. Its goal, mathematically speaking, is to estimate (not determine, but find a probability to) an unknown conditional distribution P (y|x) given the dataset D, where x and y are variables, representing source input and target output. There are two types, the Encoder-Decoder Model and the Encoder-Decoder with Attention. Both models are multilayer.

They follow a sequence for both input and output, executing and re-executing different calculi to get the result that fits better the desired final representation of the natural language.

Following the latest advances of AI, Machine Learning and Deep Learning, machines are always more and more capable of delivering great results. Beyond the computing capacity and its ability of storing glossaries without “forgetting” them, as humans normally do, Machine Learning and Deep Learning represent a technology that took Machine Translation to the next level. Today, it represents significant advantages, as time-saving, scalability, and cost-effectiveness. On the other hand, the sole Machine Translation may not be for every case. Wherever creativity, idiomatic expressions, adaptation, etc., must be put into place, a human hand on the project is more than necessary, it is essential. In conclusion, human translation or, at least, Post-editing — when the human comes in, after the Machine Translation —, is still the gold standard for the best quality in translation services.

Source: Memsource | MemoQ | Smartcat | Machine Learning Mastery | Science Direct
Cover image by Dan Cristian Pădureț from Pexels