Successful Research Year for CAMeL

2020 Lab Achievements


A successful research year for the Computational Approaches to Modeling Language Lab (CAMeL) at NYUAD

Researchers and students in the Computational Approaches to Modeling Language lab (CAMeL) at New York University Abu Dhabi have published 17 publications and released 7 resources in 2020 in the field of natural language processing. Some of the papers were accepted for presentations in the following conferences: ACL 2020 (Online), COLING 2020 (Online), LREC 2020 (cancelled due to COVID-19).  We also have one paper in the TALLIP Journal and one in Natural Language Engineering.  We are also happy to celebrate the first doctoral dissertation by our own, now Dr. Nasser Zalmout.  Our work was featured in various media outlets (11 times) and through a number of invited online talks and one tutorial.

Some of these efforts were in collaboration with researchers affiliated with the following institutions:
Carnegie Mellon University in Qatar, the University of British Columbia, American University of Beirut, Indian Institute of Technology - Dhanbad, Université Sorbonne Paris Nord, Ohio State University, Johns Hopkins University, University of Cambridge, and ETH Zürich.

The presented work covers a range of Arabic language processing topics such as syntactic parsing, morphological disambiguation, spelling correction of Arabizi and Arabic dialects, lexical resources, dialect identification, addressing gender bias in AI, dialogue systems, and open source tools for Arabic NLP.

Publications

Computational Morphology
  • Erdmann, Alexander, Micha Elsner Shijie Wu, Ryan Cotterell, and Nizar Habash. The Paradigm Discovery Problem. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.

  • Zalmout, Nasser and Nizar Habash. Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.

  • Zalmout, Nasser and Nizar Habash. Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.

  • Khalifa, Salam, Nasser Zalmout, Nizar Habash. Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France.

Computational Syntax
  • Kankanampati, Yash, Joseph Le Roux, Nadi Tomeh, Dima Taji, Nizar Habash. Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.

  • Taji, Dima and Nizar Habash. PALMYRA 2.0: A Configurable Multilingual Platform Independent Tool for Morphology and Syntax Annotation. In Proceedings of the Fourth Workshop on Universal Dependencies (COLING 2020, Workshop on Universal Dependencies), Online.

Spelling Correction
  • Shazal, Ali, Aiza Usman, Nizar Habash. A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models. In Proceedings of the Fifth Arabic Natural Language Processing Workshop (COLING 2020, Arabic NLP Workshop)

  • Eryani, Fadhl, Nizar Habash, Houda Bouamor, Salam Khalifa. A Spelling Correction Corpus for Multiple Arabic Dialects. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

Lexical Resources
  • Jiang, Zhengyang, Nizar Habash and Muhamed Al Khalil. An Online Readability Leveled Arabic Thesaurus.  In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.

  • Al Khalil, Muhamed, Nizar Habash, Zhengyang Jiang. A Large-Scale Leveled Readability Lexicon for Standard Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

  • Badaro, Gilbert, Hazem Hajj, and Nizar Habash. "A Link Prediction Approach for Accurately Mapping a Large-scale Arabic Lexical Resource to English WordNet." ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, no. 6 (2020): 1-38.

Dialect Identification
  • Abdul-Mageed, Muhammad, Chiyu Zhang, Houda Bouamor, Nizar Habash. NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop (COLING 2020, Arabic NLP Workshop), Online.

Addressing Bias in AI
Dialogue Systems
  • Chierici, Alberto M., Nizar Habash, Margarita Bicec. The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

Open Source Tools
  • Obeid, Ossama, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, Nizar Habash. CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

Resources

Tools 
  • CAMeL Tools - an open-source Python Arabic Natural Language Processing (NLP) toolkit for pre-processing, morphological modeling, dialect identification, named entity recognition, and sentiment analysis.

  • Palmyra - a Platform Independent Dependency Annotation Tool for Morphologically Rich Languages.

  • The Readability Leveled Arabic Thesaurus - A tool for smart search over related words in Arabic.

Corpora & Lexicons
  • Annotated Gumar Corpus - A Morphologically Annotated Corpus of Gulf Arabic.

  • CODA Correction Corpus - a spelling correction corpus for multiple Arabic dialects.

  • The Margarita Dialogue Corpus - a corpus of dialogues paired with a database of videos. 

  • SAMER Readability Lexicon - A 26,000 lemma resource labelled for readability level

Talks and Media Coverage

Selected Invited Talks and Tutorials Available Online
Media Coverage
Other Honors