call us now on
+44 (0) 20 7952 7500 (UK)
+1 631-576-8235 (US)

or email us

Rage against the machine?

With language being one of the most distinguishing features separating humans from other creatures, the suggestion that a software program can replicate this skill at the click of a button is a revelation in terms of computational achievement to some and laughable to others...

Machine translation (MT) is the use of software to translate either speech or text from one language to another. At its basic level, this will be the purely dictionary-based substitution of one isolated word with another and can be customised to optimise results. At a more complex level, a method of Natural Language Processing (NLP) endeavours to program a computer to 'understand' text as a human would and not simply substitute one for one. The goal is to create translated text that sounds as if it has been written by a person. To do this successfully however, like the human translator, the program must be able to analyse the text in terms of its grammar, syntax, idiom and the culture of its speakers.

In addition to the complexities of the language itself, the program must also be able to decompose text into words, numbers and punctuation marks so that this information is correctly shown in the layout of the target text. One of the main difficulties translation programs have to cope with is the ambiguity of single words and whole sentences, word order and compound words.

Interlingual machine translation, such as Babelfish, is an example of rule-based machine translation. The source text is transformed through these programmed rules into an interlingual, an interim codified language that the computer uses as an internal representation. The target language is then output from this.

Example-based machine translation is an approach which uses a bilingual corpus and is essentially translation by analogy.

Statistical machine translation does not apply grammatical rules. This is a method that differs from previous efforts in that it forgoes language experts who program grammatical rules and dictionaries into computers. Translations are generated based on the comparison of a set of two documents, one in the original language and one translated by a human. Patterns and links between the two are found and used to create future translations. The results will reflect the extent of data used and statistics generated. The more you feed it, the more it will grow. Google currently offers statistical translation of Arabic, Chinese and Russian. Documents from the European Commission and United Nations (200 billion words) were used to train its machines.

Total success is still a long way off but machine translation programs are used by a number of world organisations, one of the largest institutional users being the European Commission. It uses a highly customised version of the commercial MT system SYSTRAN to automatically translate documents for internal use.

With Google seeing successful results from its statistical approach with its initial language pairs, developments in machine translation look set to continue.

Machine translation results, though helpful on occasion for gist purposes for the individual or for volume purposes for institutions, cannot yet replace a human translation for accuracy. Proofreading, editing and correction would always be required even with the best results.... and for the worst - complete re-translation will be necessary - with a human touch.

Back to the April 2007 edition

Back to The Lingo-ist index