Google’s Multilingual Neural Machine Translation System
Dec 7, 2016
This post was contributed by Alan Mosca, a PhD student in Birkbeck’s Department of Computer Science and Information Systems. Alan tweets at @nitbix
A Google research group has announced a breakthrough that could have a deep impact on the field of automated translation of documents and web pages.
In the recently released article “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation” they show how their Neural Machine Translation (NMT) system is able to perform translation between pairs of languages, for which the system has never seen any examples.
In practice, this means that Google’s system is able to automatically translate between two languages, without adopting the “trick” of interlingual translation. (Interlingual translation is a technique commonly adopted in machine translation, of using a common intermediate language to bridge two languages for which there is no corpora available. In this example, the translation would be French -> English -> German, and vice versa, using English as the bridging language). This occurs through a common deep learning method called Long-Short Term Memory (LSTM), through which a machine can learn how to translate between, say, English and French and English and German by processing examples of translations.
The exciting development is that all of this is achieved in a single model, which is able to operate on multiple language pairs. It even appears to have had the effect of the model developing its own “internal representation” of concepts, which is completely independent of the specific languages it learns to translate. The examples in the paper are not limited to European languages, either – the system is able to translate between Japanese and Korean without seeing a simple example that joins the two languages. An example of how this works is shown in Fig. 1.
All of this, of course, is done inside a deep learning model: an LSTM. The multi-lingual translation is achievable in the single model by adding a token for the destination language in the input. For example, if one wanted to translate “Hello, my name is Bob” to Spanish, the input would be “<2es> Hello, my name is Bob”.
A further exciting observation made by researchers from Google Brain is that the system does not need to be told what language the input is in, disambiguating the difficult cases on its own. Take the word “burro” for instance: it means “butter” in Italian but “donkey” in Spanish. Even for words that have the same spelling but different meanings in different languages, the system is usually able to discriminate based on context.
The model learns an “encoder” LSTM and a “decoder” LSTM; it has a similar appearance to multi-layer auto-encoders. The centre contains an attention model, and the layer just before the attention is the one that outputs the “common encoding”: a semantic representation of the input that is language-independent.
Being Google, as well as testing on the benchmark datasets in machine translation, they used their own internal dataset, which is probably very large and certainly very private. The code is very private too, but the researchers have given us an insight into the kind of infrastructure they needed: 100 (presumably state-of-the-art) GPUs, trained for over 3 weeks. The results are impressive, beating state-of-the-art ad-hoc models in a few cases. For a single model developed for multiple languages, Google’s NMT system provides a great advantage, and we should expect ever better translations from Google Translate as a consequence.