Successful Research Year for CAMeL
2020 Lab Achievements
A successful research year for the Computational Approaches to Modeling Language Lab (CAMeL) at NYUAD
Researchers and students in the Computational Approaches to Modeling Language lab (CAMeL) at New York University Abu Dhabi have published 17 publications and released 7 resources in 2020 in the field of natural language processing. Some of the papers were accepted for presentations in the following conferences: ACL 2020 (Online), COLING 2020 (Online), LREC 2020 (cancelled due to COVID-19). We also have one paper in the TALLIP Journal and one in Natural Language Engineering. We are also happy to celebrate the first doctoral dissertation by our own, now Dr. Nasser Zalmout. Our work was featured in various media outlets (11 times) and through a number of invited online talks and one tutorial.
Some of these efforts were in collaboration with researchers affiliated with the following institutions:
Carnegie Mellon University in Qatar, the University of British Columbia, American University of Beirut, Indian Institute of Technology - Dhanbad, Université Sorbonne Paris Nord, Ohio State University, Johns Hopkins University, University of Cambridge, and ETH Zürich.
The presented work covers a range of Arabic language processing topics such as syntactic parsing, morphological disambiguation, spelling correction of Arabizi and Arabic dialects, lexical resources, dialect identification, addressing gender bias in AI, dialogue systems, and open source tools for Arabic NLP.
Publications
Computational Morphology
Erdmann, Alexander, Micha Elsner Shijie Wu, Ryan Cotterell, and Nizar Habash. The Paradigm Discovery Problem. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.
Zalmout, Nasser and Nizar Habash. Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.
Zalmout, Nasser and Nizar Habash. Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.
Khalifa, Salam, Nasser Zalmout, Nizar Habash. Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France.
- Zalmout, Nasser. "Morphological Tagging and Disambiguation in Dialectal Arabic Using Deep Learning Architectures." PhD diss., New York University Tandon School of Engineering, 2020.
- Salloum, Wael and Nizar Habash. "Unsupervised Arabic dialect segmentation for machine translation." Natural Language Engineering. Cambridge University Press. 2020.
Computational Syntax
Kankanampati, Yash, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash. Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.
Taji, Dima and Nizar Habash. PALMYRA 2.0: A Configurable Multilingual Platform Independent Tool for Morphology and Syntax Annotation. In Proceedings of the Fourth Workshop on Universal Dependencies (COLING 2020, Workshop on Universal Dependencies), Online.
Spelling Correction
Shazal, Ali, Aiza Usman, Nizar Habash. A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models. In Proceedings of the Fifth Arabic Natural Language Processing Workshop (COLING 2020, Arabic NLP Workshop)
Eryani, Fadhl, Nizar Habash, Houda Bouamor, Salam Khalifa. A Spelling Correction Corpus for Multiple Arabic Dialects. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.
Lexical Resources
Jiang, Zhengyang, Nizar Habash and Muhamed Al Khalil. An Online Readability Leveled Arabic Thesaurus. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.
Al Khalil, Muhamed, Nizar Habash, Zhengyang Jiang. A Large-Scale Leveled Readability Lexicon for Standard Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.
Badaro, Gilbert, Hazem Hajj, and Nizar Habash. "A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet." ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, no. 6 (2020): 1-38.
Dialect Identification
Abdul-Mageed, Muhammad, Chiyu Zhang, Houda Bouamor, Nizar Habash. NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop (COLING 2020, Arabic NLP Workshop), Online.
Addressing Bias in AI
Alhafni, Bashar, Nizar Habash and Houda Bouamor. Gender-Aware Reinflection using Linguistically Enhanced Neural Models. Proceedings of the Second Workshop on Gender Bias in Natural Language Processing (COLING 2020, Workshop on Gender Bias in NLP), Online.
Dialogue Systems
Chierici, Alberto M., Nizar Habash, Margarita Bicec. The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.
Open Source Tools
Obeid, Ossama, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, Nizar Habash. CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.
Resources
Tools
CAMeL Tools - an open-source Python Arabic Natural Language Processing (NLP) toolkit for pre-processing, morphological modeling, dialect identification, named entity recognition, and sentiment analysis.
Palmyra - a Platform Independent Dependency Annotation Tool for Morphologically Rich Languages.
The Readability Leveled Arabic Thesaurus - A tool for smart search over related words in Arabic.
Corpora & Lexicons
Annotated Gumar Corpus - A Morphologically Annotated Corpus of Gulf Arabic.
CODA Correction Corpus - a spelling correction corpus for multiple Arabic dialects.
The Margarita Dialogue Corpus - a corpus of dialogues paired with a database of videos.
SAMER Readability Lexicon - A 26,000 lemma resource labelled for readability level
Talks and Media Coverage
Selected Invited Talks and Tutorials Available Online
Habash, Nizar. A Short Introduction to Arabic Natural Language Processing. Online, KAUST Webinar on Machine Learning and Arabic NLP, Nov 30, 2020.
Habash, Nizar. “AI and Arabic: Challenges and Solutions”. (Original title: “الذكاء الاصطناعي واللغة العربية: تحديات وحلول”). Harvard Business Review Arabia Webinar. June 23, 2020.
Habash, Nizar, and Obeid, Ossama. CAMeL Tools: an open-source Python Arabic Natural Language Processing (NLP) toolkit. (Original title: أدوات كامل: مجموعة أدوات مفتوحة المصدر بلغة بايثون لمعالجة اللغة العربية). IWAN Research Group. Nov 11, 2020.
Habash, Nizar, Ossama Obeid, Salam Khalifa, Dima Taji, Bashar Al Hafni, Fadhl Eryani, and Go Inoue. Tutorial: Text Analysis of Arabic - as part of the Winter Institute in Digital Humanities 2020. Jan, 2020.
Media Coverage
Alittihad - الاتحاد
Alittihad article covering a panel discussion on Arabic in the digital world hosted by the Ministry of Youth and Culture. (Original title: اللغة العربیة: أین مکانها في عالم الرقمنة). Dec 24, 2020.
Albayan - البيان
Albayan article covering a panel discussion on Arabic in the digital world hosted by the Ministry of Youth and Culture. (Original title:"تحديات رقمنة المحتوى محور ندوة "أسبوع العربية). Dec 24, 2020.
Alkhaleej - الخليج
Alkhaleej article covering a panel discussion on Arabic in the digital world hosted by the Ministry of Youth and Culture. (Original title:"اللغة العربية والرقمنة.. "حلول لتطوير المحتوى). Dec 23, 2020.
UAE Ministry of Culture and Youth - وزارة الثقافة والشباب لدولة الإمارات
Panel on Arabic Language’s Status in the Digital World. (Original title: اللغة العربیة: أین مکانها في عالم الرقمنة). Dec 22, 2020.
Albayan - البيان
تقرير حالة اللغة يرسم ملامح مستقبل العربية. Dec 20, 2020
Emirates News Agency (WAM) - وكالة أنباء الإمارات
MIT Technology Review - إم آي تي تكنولوجي ريفيو العربية
New York University Abu Dhabi
Programming Prejudice. Dec 10, 2020.
Alkhaleej - الخليج
New York University Abu Dhabi
From NYUAD to Google: Class of 2020 Alumnus Daniel Watson Secures Place in Competitive AI Residency Program. Sep 29, 2020.
The National
Other Honors
WOLDA Logo Awards