New Software That Detects up to 25 Arabic Dialects Launches to the Public

The demo paper on Automatic Dialect Identification for Arabic has been presented at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics

July 8, 2019 - Researchers from several institutions recently developed a public online interface called Automatic Dialect Identification for Arabic (ADIDA), which visually identifies dialects of 25 different Arab cities, from Rabat to Muscat, along with Modern Standard Arabic (MSA).

Presented at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2019), ADIDA was developed by Associate Professor of Computer Science at NYU Abu Dhabi (NYUAD) and Director of NYUAD’s Computational Approaches to Modeling Language Lab (CAMeL) Nizar Habash and his team, in collaboration with researchers from other institutions, under the Multi Arabic Dialect Applications and Resources (MADAR) project.

Users can input their Arabic text into ADIDA, which will display the results as a point map or a heat map overlaid on top of a geographical map of the Arab World, based on the likelihood an input is from the 25 cities. This excludes Modern Standard Arabic text as there is no specific geographical location that can represent it. The online interface will also present the top five cities with their probabilities, together with that of Modern Standard Arabic.

Nizar Habash, Associate Professor of Computer Science, NYUAD

Commenting on the research, Habash notes: “Dialect identification is an important enabling technology that has the potential to support a range of language artificial intelligence applications though better user dialect profiling. For example,  dialect-aware machine translation or chatbots can determine whether the correct meaning of the word ماشي /m aa sh i/ is ‘ok’ (in Cairo and Beirut) or ‘no’ (in Sana’a). The confidence and accuracy of the system generally increases with the length of the input sentence, as many short phrases and words can belong to different dialects.”

The interface visualizes the probability distribution into a two-dimensional geographical map space to allow users to easily observe connections and patterns relating to dialectal similarities and differences, and detect aggregations of probabilities of nearby cities that give a sense of regional presence.

For further information about ADIDA, please download the demo paper.