Innovations in Machine Translation

Over the last few decades, machine translation has experienced a number of crucial changes. Considering the fact that SMT, also known as phrase-based statistical machine translation, hasn’t improved much in the last years, people tend to turn to NMT, also known as neural machine translation, instead. After being introduced in 2014, NMT systems have seen a lot of refinements.

Sequence-to-sequence translation model is another name of this type of systems. Initially, these models were fairly simple as they were made out of just two recurrent parts. However, encoder-decoder networks prove to create sentences that sound very logical and, consequently, good.

Believe it or not, early NMT systems used to work on the word level. Due to the attention mechanism, there was an alignment between the sentences from the source language and the ones from the target language. These models worked really well, but they had their pitfalls. Firstly, if the input sentence was long, it made it difficult for the encoder to get all the information in its fixed-size representation. Secondly, these systems could only work with a limited vocabulary size.

These problems were partially solved in 2015. With the introduction of the attention models, the decoder part could capture different parts of the input. Consequently, the output was generated at a more advanced level. Furthermore, the attention models are used in multi-modal models for tasks that allow decoding the input image and generating the output text.

Other difficulties that might be faced by the NMT systems include out-of-vocabulary words since the output can be translated badly. Besides, the word could be in the wrong inflection or misspelled, because it was not found in the dictionary. Thus, there have been many researches done concerning this issue. According to one article, if there is the division into subword units, it is easier to build a vocabulary. Using the Byte Pair Encoding algorithm, the words are segmented making it possible to generate the embeddings for these specific subword units. There are also samples of composite words in the source language that are translated into a sequence of words in the target language.

A second paper presents a translation-based model that uses a BPE-generated vocabulary along with bi-scale RNN (the recurrent neural network). The authors admit that the usage of the BPE tokens in the encoding part together with a character-based decoding part allows producing the best translations. In order to make this conclusion, such language pairs as English-German, English-Russian, English-Finnish, and English-Czech were evaluated.

So, here are some of the most prominent achievements that were made in the field of translation.