Researchers Develop Large-scale Readability Leveled Thesaurus in Arabic

The one-of-a-kind interface provides possible roots, English glosses, related Arabic words and phrases, and a five-level readability scale for a user-provided Arabic word

Press Release

Researchers from NYU Abu Dhabi (NYUAD) have developed an Online Readability Leveled Arabic Thesaurus. The work was conducted by Associate Professor of Practice of Arabic Language Muhamed Al Khalil in collaboration with Professor of Computer Science Nizar Habash, who also leads the Computational Approaches to Modeling Language (CAMeL) Lab.

The one-of-a-kind interface provides the possible roots, English glosses, related Arabic words and phrases, and readability on a five-level readability scale for a user-inputted Arabic word. It also connects multiple existing Arabic resources and processing tools, enabling Arabic speakers and learners to benefit from recent advances in Arabic computational linguistics technologies. 

The interface is one of the products of the NYUAD-funded project Simplification of Arabic Masterpieces for Extensive Reading (SAMER), and a demo version of it is available for public use here.

A collaboration between NYUAD’s Arabic Studies Program and CAMeL Lab, SAMER seeks to create a standard for the simplification of modern fiction in Arabic to school-age learners and to use this standard to simplify a number of Arabic fiction masterpieces.

 

“Arabic is one of the UN’s six official languages; it is the language of hundreds of millions of people in the Arab world and beyond. It is extraordinarily rich linguistically but with that comes higher complexity and a steeper learning curve. Add to this the fact that the standard form of Arabic used in education and media is not the daily form spoken by modern-day Arabs who speak a variety of its dialects. As such, there is a great need to have user-friendly tools supporting Arabic teachers and learners. We hope this will be an important aid in filling this learning gap.”

Associate Professor of Practice of Arabic Language Muhamed Al Khalil
 

“Arabic poses many difficulties for Artificial Intelligence, some of which are similar to those facing new learners: it has a very rich word structure, a highly ambiguous spelling system, and many dialects.  The resources we developed have great potential for developing smart technologies that can assist natives and learners interested in writing and reading in Arabic.”

Professor of Computer Science Nizar Habash

Established in September 2014, CAMeL’s mission is research and education in artificial intelligence, specifically focusing on natural language processing, computational linguistics, and data science. The main lab research areas are Arabic natural language processing, machine translation, text analytics, and dialogue systems.

The interface was presented as part of the International Conference on Computational Linguistics (COLING) 2020. The paper entitled A Large-Scale Leveled Readability Lexicon for Standard Arabic, (presented at the 12th Language Resources and Evaluation Conference in Marseille, France) provides further research background on the thesaurus.


About NYU Abu Dhabi

NYU Abu Dhabi is the first comprehensive liberal arts and science campus in the Middle East to be operated abroad by a major American research university. NYU Abu Dhabi has integrated a highly-selective liberal arts, engineering and science curriculum with a world center for advanced research and scholarship enabling its students to succeed in an increasingly interdependent world and advance cooperation and progress on humanity’s shared challenges. NYU Abu Dhabi’s high-achieving students have come from 115 nations and speak over 115 languages. Together, NYU's campuses in New York, Abu Dhabi, and Shanghai form the backbone of a unique global university, giving faculty and students opportunities to experience varied learning environments and immersion in other cultures at one or more of the numerous study-abroad sites NYU maintains on six continents.