Azure Text Translation

This MLHub package provides a quick introduction to the pre-built Text Translation models provided through Azure's Cognitive Services. This service translates text between multiple languages, also identifying the source language. Many languages are supported. This package is part of the Azure on MLHub repository.

In addition to the demo command this package provides a collection of commands that turn the service into useful command line tools for translation and transliteration.

A free Azure subscription allowing up to 2,000,000 character translations per month is available from https://azure.microsoft.com/free/ as the F0 pricing tier. After subscribing visit https://ms.portal.azure.com and Create a resource under AI and Machine Learning called Text Translations. Once created you can access the web API subscription key and endpoint from the portal. This will be prompted for when running a command, and then saved to file to reduce the need for repeated authentication requests.

Please note that these Azure models, unlike the MLHub models in general, use closed source services which have no guarantee of ongoing availability and do not come with the freedom to modify and share.

Visit the github repository for more details: https://github.com/azure/aztranslate

The Python code is based on the Azure Text Translator Quick Start

Usage

$ pip3 install mlhub
$ ml install   aztranslate
$ ml configure aztranslate

Command Line Tools

In addition to the demo presented below, the aztranslate package provides a number of useful command line tools. Below we demonstrate a number of these. Most commands take text on the command line, piped through to the command, from a supplied file, or else through an interactive session.

supported

The supported command is useful in checking which languages are supported for translation.

$ ml supported aztranslate
af,ltr,Afrikaans,Afrikaans
ar,rtl,Arabic,العربية
bg,ltr,Bulgarian,Български
bn,ltr,Bangla,বাংলা
bs,ltr,Bosnian,bosanski (latinica)
ca,ltr,Catalan,Català
cs,ltr,Czech,Čeština
cy,ltr,Welsh,Welsh
da,ltr,Danish,Dansk
de,ltr,German,Deutsch
el,ltr,Greek,Ελληνικά
en,ltr,English,English
es,ltr,Spanish,Español
et,ltr,Estonian,Eesti
fa,rtl,Persian,Persian
fi,ltr,Finnish,Suomi
fil,ltr,Filipino,Filipino
fj,ltr,Fijian,Fijian
fr,ltr,French,Français
he,rtl,Hebrew,עברית
hi,ltr,Hindi,हिंदी
hr,ltr,Croatian,Hrvatski
ht,ltr,Haitian Creole,Haitian Creole
hu,ltr,Hungarian,Magyar
id,ltr,Indonesian,Indonesia
is,ltr,Icelandic,Íslenska
it,ltr,Italian,Italiano
ja,ltr,Japanese,日本語
ko,ltr,Korean,한국어
lt,ltr,Lithuanian,Lietuvių
lv,ltr,Latvian,Latviešu
mg,ltr,Malagasy,Malagasy
ms,ltr,Malay,Melayu
mt,ltr,Maltese,Il-Malti
mww,ltr,Hmong Daw,Hmong Daw
nb,ltr,Norwegian,Norsk
nl,ltr,Dutch,Nederlands
otq,ltr,Querétaro Otomi,Querétaro Otomi
pl,ltr,Polish,Polski
pt,ltr,Portuguese,Português
ro,ltr,Romanian,Română
ru,ltr,Russian,Русский
sk,ltr,Slovak,Slovenčina
sl,ltr,Slovenian,Slovenščina
sm,ltr,Samoan,Samoan
sr-Cyrl,ltr,Serbian (Cyrillic),srpski (ćirilica)
sr-Latn,ltr,Serbian (Latin),srpski (latinica)
sv,ltr,Swedish,Svenska
sw,ltr,Kiswahili,Kiswahili
ta,ltr,Tamil,தமிழ்
te,ltr,Telugu,తెలుగు
th,ltr,Thai,ไทย
tlh,ltr,Klingon,Klingon
to,ltr,Tongan,lea fakatonga
tr,ltr,Turkish,Türkçe
ty,ltr,Tahitian,Tahitian
uk,ltr,Ukrainian,Українська
ur,rtl,Urdu,اردو
vi,ltr,Vietnamese,Tiếng Việt
yua,ltr,Yucatec Maya,Yucatec Maya
yue,ltr,Cantonese (Traditional),粵語 (繁體中文)
zh-Hans,ltr,Chinese Simplified,简体中文
zh-Hant,ltr,Chinese Traditional,繁體中文

To check if a specific language is supported:

$ ml supported aztranslate fr
fr,ltr,French,Français

$ ml supported aztext ku

Use the --header command line option to list the header row which names the columns:

$ ml supported aztext --header fr
code,direction,name,native
fr,ltr,French,Français

The --transliterate option will identify the transliteration pairs available for each language.

$ ml supported aztranslate --transliterate
ar,Arabic,العربية,Arab-Latn Latn-Arab 
bn,Bangla,বাংলা,Beng-Latn Latn-Beng 
gu,Gujarati,ગુજરાતી,Gujr-Latn Latn-Gujr 
he,Hebrew,עברית,Hebr-Latn Latn-Hebr 
hi,Hindi,हिंदी,Deva-Latn Latn-Deva 
ja,Japanese,日本語,Jpan-Latn Latn-Jpan 
kn,Kannada,ಕನ್ನಡ,Knda-Latn Latn-Knda 
ml,Malayalam,മലയാളം,Mlym-Latn Latn-Mlym 
mr,Marathi,मराठी,Deva-Latn Latn-Deva 
or,Oriya,Oriya,Orya-Latn Latn-Orya 
pa,Punjabi,ਪੰਜਾਬੀ,Guru-Latn Latn-Guru 
sr-Cyrl,Serbian (Cyrillic),srpski (ćirilica),Cyrl-Latn 
sr-Latn,Serbian (Latin),srpski (latinica),Latn-Cyrl 
ta,Tamil,தமிழ்,Taml-Latn Latn-Taml 
te,Telugu,తెలుగు,Telu-Latn Latn-Telu 
th,Thai,ไทย,Thai-Latn Latn-Thai 
zh-Hans,Chinese Simplified,简体中文,Hans-Latn Hans-Hant Latn-Hans Latn-Hant 
zh-Hant,Chinese Traditional,繁體中文,Hant-Latn Hant-Hans Latn-Hans Latn-Hant 
06 Nov 17:38:23 gjw@yoga ~azure/aztranslate$ python3 supported.py --transliteration
ar,Arabic,العربية,Arab:Latn Latn:Arab 
bn,Bangla,বাংলা,Beng:Latn Latn:Beng 
gu,Gujarati,ગુજરાતી,Gujr:Latn Latn:Gujr 
he,Hebrew,עברית,Hebr:Latn Latn:Hebr 
hi,Hindi,हिंदी,Deva:Latn Latn:Deva 
ja,Japanese,日本語,Jpan:Latn Latn:Jpan 
kn,Kannada,ಕನ್ನಡ,Knda:Latn Latn:Knda 
ml,Malayalam,മലയാളം,Mlym:Latn Latn:Mlym 
mr,Marathi,मराठी,Deva:Latn Latn:Deva 
or,Oriya,Oriya,Orya:Latn Latn:Orya 
pa,Punjabi,ਪੰਜਾਬੀ,Guru:Latn Latn:Guru 
sr-Cyrl,Serbian (Cyrillic),srpski (ćirilica),Cyrl:Latn 
sr-Latn,Serbian (Latin),srpski (latinica),Latn:Cyrl 
ta,Tamil,தமிழ்,Taml:Latn Latn:Taml 
te,Telugu,తెలుగు,Telu:Latn Latn:Telu 
th,Thai,ไทย,Thai:Latn Latn:Thai 
zh-Hans,Chinese Simplified,简体中文,Hans:Latn Hans:Hant Latn:Hans Latn:Hant 
zh-Hant,Chinese Traditional,繁體中文,Hant:Latn Hant:Hans Latn:Hans Latn:Hant 

The 4 letter script names are reported paired in a from:to ordering.

detect

The detect command will identify the language of a provided text, the confidence of the detection, and whether translation and transliteration are supported for that language.

$ ml detect aztranslate उनकी कविता में प्रकृति के सौंदर्य और कोमलतम मानवीय भावनाओं का उत्कृष्ट चित्रण है.
hi,1.00,True,True

translate

The translate command takes a text to be translated and returns the identified language code, the certainty of that, the language code for the target translation, and the resulting translation.

$ ml translate aztranslate मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,en,Tell me the most important message this morning

$ ml translate aztranslate उनकी कविता में प्रकृति के सौंदर्य और कोमलतम मानवीय भावनाओं का उत्कृष्ट चित्रण है.
hi,1.00,en,His poetry has excellent depictions of nature's beauty and the softest human emotions.

As a command line tool the text to be translated can be piped into the command:

$ echo मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ | ml translate aztranslate
hi,1.00,en,Tell me the most important message this morning

If a file name is supplied then each line within the file is translated, line by line:

$ ml translate aztranslate thai.txt
th,1.00,en,Congee
th,1.00,en,Rice kan Chin
th,1.00,en,Pork Leg Rice
th,1.00,en,Rice Omelet
th,1.00,en,Fried rice with shrimp paste
th,1.00,en,Kao Mok Chicken
th,1.00,en,Beef Porridge
th,1.00,en,Chicken Rice
th,1.00,en,Crispy pork rice
th,1.00,en,Red Pork crispy Pork rice

Use the --keep command line option to retain the original text:

$ ml translate aztranslate --keep scratch/thai_menu.txt
th,1.00,en,โจ๊ก,Congee
th,1.00,en,ข้าวกั๊นจิ๊น,Rice kan Chin
th,1.00,en,ข้าวขาหมู,Pork Leg Rice
th,1.00,en,ข้าวไข่เจียว,Rice Omelet
th,1.00,en,ข้าวคลุกกะปิ,Fried rice with shrimp paste
th,1.00,en,ข้าวหมกไก่,Kao Mok Chicken
th,1.00,en,ข้าวหมกเนื้อ,Beef Porridge
th,1.00,en,ข้าวมันไก่,Chicken Rice
th,1.00,en,ข้าวหมูกรอบ,Crispy pork rice
th,1.00,en,ข้าวหมูกรอบหมูแดง,Red Pork crispy Pork rice

The --profanity command line option will replace any identified profanities in the translation with asterisks.

If no text is supplied on the command line nor through a pipe nor from a specified file then the program enters an interactive loop:

$ ml translate aztranslate
Enter text to be analysed. Quit with Empty or Ctrl-d.

> मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ?
hi,1.00,en,Tell me the most important message this morning?

> ข้าวคลุกกะปิ
th,1.00,en,Fried rice with shrimp paste

> Di mana toko yang baik untuk membeli ponsel?
id,1.00,en,Where is a good store to buy mobile phones?

> 

The default is to translate into English (en). Other languages can be chosen:

$ ml translate aztranslate --to=id मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,id,Ceritakan pesan yang paling penting pagi ini

$ ml translate aztranslate --to=fr मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ
hi,1.00,fr,Dites-moi le message le plus important ce matin

By default the translator will determine the source language. This can be overridden using --from=.

Different language translation engines have had different training experiences and thus have different capabilities. For example, Google translates the Indonesian Wah kayak artis Korea into Wow, like a Korean artist whilst Azure translates it as Wah Kayaking Korean artist. This can have an impact on downstream processing such as sentiment analysis, for example.

$ ml translate aztranslate Wah kayak artis Korea
id,1.0,en,Wah Kayaking Korean artist

$ ml translate aztranslate Wah kayak artis Korea | cut -d, -f4 | ml sentiment aztext
0.50

$ ml sentiment aztext Wow, like a Korean artist
0.97

transliterate

The transliterate command takes a text to be transliterated, for example into Latin characters, retaining the phonetics. This command is under development and currently only supports transliteration from Thai script to Latin script for illustrative purposes.

$ ml transliterate aztranslate คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
khua kling kaeng yot maphrao on sai kai

$ ml translate aztranslate คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
th,1.00,en,Roasted Coconut curry with chicken

Normally the LANGUAGE can be automatically determined and the first script language reported by the supported --transliterate command is the default FROM script. The default TO script is Latin. Command line options can be used to specify the LANGUAGE, the FROM script and the TO script if required. This is also useful to remove additional API calls (to determine the language and the default FROM script) for each query.

$ ml transliterate aztranslate -l th -f thai -t latn คั่ว กลิ้ง แกง ยอด มะพร้าว อ่อน ใส่ ไก่
th,thai,latn,khua kling kaeng yot maphrao on sai kai

Demonstration

$ ml demo aztranslate
======================
Azure Text Translation
======================

Welcome to a demo of the pre-built models for Text Translation provided
through Azure's Cognitive Services. This service translates text between
multiple languages.

The following file has been found and is assumed to contain an Azure Text
Translator subscription key. We will load the file and use this information.

    /home/gjw/.mlhub/aztranslate/private.py

Press Enter to continue: 

===================
Supported Languages
===================

These are the languages supported by the Azure Translator for translation.

af      ltr Afrikaans                 Afrikaans                
ar      rtl Arabic                    العربية                  
bg      ltr Bulgarian                 Български                
bn      ltr Bangla                    বাংলা                    
bs      ltr Bosnian                   bosanski (latinica)      
ca      ltr Catalan                   Català                   
cs      ltr Czech                     Čeština                  
cy      ltr Welsh                     Welsh                    
da      ltr Danish                    Dansk                    
de      ltr German                    Deutsch                  
el      ltr Greek                     Ελληνικά                 
en      ltr English                   English                  
es      ltr Spanish                   Español                  
et      ltr Estonian                  Eesti                    
fa      rtl Persian                   Persian                  
fi      ltr Finnish                   Suomi                    
fil     ltr Filipino                  Filipino                 
fj      ltr Fijian                    Fijian                   
fr      ltr French                    Français                 

Press Enter to continue: 

he      rtl Hebrew                    עברית                    
hi      ltr Hindi                     हिंदी                    
hr      ltr Croatian                  Hrvatski                 
ht      ltr Haitian Creole            Haitian Creole           
hu      ltr Hungarian                 Magyar                   
id      ltr Indonesian                Indonesia                
is      ltr Icelandic                 Íslenska                 
It      ltr Italian                   Italiano                 
ja      ltr Japanese                  日本語                      
ko      ltr Korean                    한국어                      
lt      ltr Lithuanian                Lietuvių                 
lv      ltr Latvian                   Latviešu                 
mg      ltr Malagasy                  Malagasy                 
ms      ltr Malay                     Melayu                   
mt      ltr Maltese                   Il-Malti                 
mww     ltr Hmong Daw                 Hmong Daw                
nb      ltr Norwegian                 Norsk                    
nl      ltr Dutch                     Nederlands               
otq     ltr Querétaro Otomi           Querétaro Otomi          
pl      ltr Polish                    Polski                   

Press Enter to continue: 

pt      ltr Portuguese                Português                
ro      ltr Romanian                  Română                   
ru      ltr Russian                   Русский                  
sk      ltr Slovak                    Slovenčina               
sl      ltr Slovenian                 Slovenščina              
sm      ltr Samoan                    Samoan                   
sr-Cyrl ltr Serbian (Cyrillic)        srpski (ćirilica)        
sr-Latn ltr Serbian (Latin)           srpski (latinica)        
sv      ltr Swedish                   Svenska                  
sw      ltr Kiswahili                 Kiswahili                
ta      ltr Tamil                     தமிழ்                    
te      ltr Telugu                    తెలుగు                   
th      ltr Thai                      ไทย                      
tlh     ltr Klingon                   Klingon                  
to      ltr Tongan                    lea fakatonga            
tr      ltr Turkish                   Türkçe                   
ty      ltr Tahitian                  Tahitian                 
uk      ltr Ukrainian                 Українська               
ur      rtl Urdu                      اردو                     
vi      ltr Vietnamese                Tiếng Việt             

Press Enter to continue: 

yua     ltr Yucatec Maya              Yucatec Maya             
yue     ltr Cantonese (Traditional)   粵語 (繁體中文)                
zh-Hans ltr Chinese Simplified        简体中文                     
zh-Hant ltr Chinese Traditional       繁體中文                     

That's 64 languages in total.

Press Enter to continue on to translations from English: 

=============================
Text Translation from English
=============================

Below we demonstrate the translation of a variety of common phrases as we might
find when interacting with a voice command system.

    Hi Tom, has my parcel arrived yet?
    Where is a good shop to buy mobile phones?
    Has Frederick replied to my email yet?
    We are running late, please start without us.
    Tell me the most important message this morning?
    When is a good time to meet Susan and Dave?

The supplied text was detected as 'en' with a score of '1.0'.

Press Enter for a translation to German: 

    Hallo Tom, ist mein Paket schon angekommen?
    Wo gibt es einen guten Laden, um Handys zu kaufen?
    Hat Frederick meine E-Mail schon beantwortet?
    Wir laufen spät, bitte starten wir ohne uns.
    Sagen Sie mir heute Morgen die wichtigste Botschaft?
    Wann ist ein guter Zeitpunkt, um Susan und Dave zu treffen?

Press Enter for a translation to Italian: 

    Ciao Tom, il mio pacco è arrivato ancora?
    Dove è un buon negozio per comprare telefoni cellulari?
    Frederick ha risposto alla mia email ancora?
    Siamo in ritardo, per favore iniziate senza di noi.
    Dimmi il messaggio più importante di stamattina?
    Quando è il momento giusto per incontrare Susan e Dave?

Press Enter for a translation to Indonesian: 

    Hi Tom, telah paket saya tiba belum?
    Di mana toko yang baik untuk membeli ponsel?
    Apakah Frederick membalas email saya?
    Kami berjalan terlambat, silakan mulai tanpa kami.
    Ceritakan pesan yang paling penting pagi ini?
    Kapan waktu yang baik untuk bertemu Susan dan Dave?

Press Enter for a translation to Hindi: 

    हाय टॉम, मेरे पार्सल अभी तक आ गया है?
    मोबाइल फोन खरीदने के लिए एक अच्छी दुकान कहां है?
    क्या Frederick मेरे ईमेल के लिए अभी तक जवाब दिया?
    हम देर से चल रहे हैं, कृपया हमारे बिना शुरू करो ।
    मुझे सबसे महत्वपूर्ण संदेश आज सुबह बताओ?
    जब एक अच्छा समय Susan और डेव से मिलने के लिए है?

Press Enter to continue on to translations back to English: 

===========================
Translation back to English
===========================

Below we translate each of the above translations back to English. Again the 
source language is automatically identified.

Here's a reminder of the original English utterances:

    Hi Tom, has my parcel arrived yet?
    Where is a good shop to buy mobile phones?
    Has Frederick replied to my email yet?
    We are running late, please start without us.
    Tell me the most important message this morning?
    When is a good time to meet Susan and Dave?


Press Enter for the translation from German (language id score=0.98): 

    HI Tom, has my Package arrived yet?
    Where is a good Store to buy Phones?
    Has Frederick already answered my email?
    We run late, please start without us.
    Tell me the most important Message this Morning?
    When is a good Time to meet Susan and Dave?

Press Enter for the translation from Italian (language id score=0.94): 

    Hello Tom, my parcel has arrived yet?
    Where is a good store to buy cell phones?
    Has Frederick responded to my email yet?
    We'Re late, please start without us.
    Tell me the most important message this morning?
    When is the right time to meet Susan and Dave?

Press Enter for the translation from Indonesian (language id score=0.98): 

    Hi Tom, have my package arrived yet?
    Where is a good store to buy a cell phone?
    Did Frederick replied to my email?
    We are running late, please start without us.
    Tell me the most important message this morning?
    When is a good time to meet Susan and Dave?

Press Enter for the translation from Hindi (language id score=0.97): 

    Hi Tom, has my parcel come yet?
    Where is a good shop to buy mobile phones?
    Has Frederick responded to my email yet?
    We are running late, please start without us.
    Tell me the most important message this morning?
    When is a good time to meet Susan and Dave?

To use the model to translate user provided text:

  $ ml translate aztranslate

Interactive Use

We can interact with the model simply. Here we enter a few texts in different languages and have them translated into English. Note the variability of the competency of the translation. Translation from the Indonesian language is not as well developed as other languages!

$ ml translate aztranslate
================================
Azure Text Translation to English
=================================

The following file has been found and is assumed to contain an Azure Text
Translator subscription key. We will load the file and use this information.

    /home/gjw/.mlhub/aztranslate/private.py

Enter a line of text in any language and we'll attempt to translate it to English.

Exit when no text supplied.

> सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है। उन्हें
> बुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिये।

The text was identified as Hindi with 100% certainty:

  English: All human beings have inherent freedom and equality in 
  terms of pride and rights. They have the wisdom and the conscience,
  and they must behave in a spirit of brotherhood.

> C’est l’exception qui confirme la règle.

The text was identified as French with 100% certainty:

  English: This is the exception that confirms the rule.

> Dimana ada kemauan, di situ ada jalan

The text was identified as Indonesian with 100% certainty:

  English: Where there's a will, there is no way

> 

To explore limitations of translations:

  $ ml limits aztranslate

Limitations of Translations

Douglas Hofstadter, a professor of cognitive science and comparative literature at Indiana University at Bloomington and author of the book Gödel, Escher, Bach, highlights in a January 2018 article in The Atlantic the limitations of automated language translation. To paraphrase, the translators do not have any deep understanding of the text but have developed a shallower mechanical process to do a decent job for simple communications.

Below we illustrate with one of Hofstadter's examples which you can replicate with the LIMITS command. See the original article for details:

https://www.theatlantic.com/technology/archive/2018/01/the-shallowness-of-google-translate/551570/

$ ml limits aztranslate

[...]

*** Consider this sample text:

In their house, everything comes in pairs. There's his car and her
car, his towels and her towels, and his library and hers.

*** The French translation is:

Dans leur maison, tout se passe par paires. Il y a sa voiture, sa
voiture, ses serviettes, ses serviettes, sa bibliothèque et la sienne.

*** Translating back to English demonstrates a shallow understanding:

In their House, everything happens in pairs. There's his car, his car,
his towels, his towels, his library and hers.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.