Successful Research Year for CAMeL

2020 Lab Achievements


A successful research year for the Computational Approaches to Modeling Language Lab (CAMeL) at NYUAD

Researchers and students in the Computational Approaches to Modeling Language lab (CAMeL) at New York University Abu Dhabi have published 17 publications and released 7 resources in 2020 in the field of natural language processing. Some of the papers were accepted for presentations in the following conferences: ACL 2020 (Online), COLING 2020 (Online), LREC 2020 (cancelled due to COVID-19).  We also have one paper in the TALLIP Journal and one in Natural Language Engineering.  We are also happy to celebrate the first doctoral dissertation by our own, now Dr. Nasser Zalmout.  Our work was featured in various media outlets (11 times) and through a number of invited online talks and one tutorial.

Some of these efforts were in collaboration with researchers affiliated with the following institutions:
Carnegie Mellon University in Qatar, the University of British Columbia, American University of Beirut, Indian Institute of Technology - Dhanbad, Université Sorbonne Paris Nord, Ohio State University, Johns Hopkins University, University of Cambridge, and ETH Zürich.

The presented work covers a range of Arabic language processing topics such as syntactic parsing, morphological disambiguation, spelling correction of Arabizi and Arabic dialects, lexical resources, dialect identification, addressing gender bias in AI, dialogue systems, and open source tools for Arabic NLP.

Publications

Computational Morphology
  • Erdmann, Alexander, Micha Elsner Shijie Wu, Ryan Cotterell, and Nizar Habash. The Paradigm Discovery Problem. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.

  • Zalmout, Nasser and Nizar Habash. Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging. In Proceedings of Conference of the Association for Computational Linguistics (ACL 2020), Online.

  • Zalmout, Nasser and Nizar Habash. Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020, Main), Online.

  • Khalifa, Salam, Nasser Zalmout, Nizar Habash. Morphological Analysis and Disambiguation for Gulf Arabic: The Interplay between Resources and Methods. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France.

Computational Syntax
Spelling Correction
Lexical Resources
Dialect Identification
Addressing Bias in AI
Dialogue Systems
  • Chierici, Alberto M., Nizar Habash, Margarita Bicec. The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

Open Source Tools
  • Obeid, Ossama, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, Nizar Habash. CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.

Resources

Tools 
  • CAMeL Tools - an open-source Python Arabic Natural Language Processing (NLP) toolkit for pre-processing, morphological modeling, dialect identification, named entity recognition, and sentiment analysis.

  • Palmyra - a Platform Independent Dependency Annotation Tool for Morphologically Rich Languages.

  • The Readability Leveled Arabic Thesaurus - A tool for smart search over related words in Arabic.

Corpora & Lexicons
  • Annotated Gumar Corpus - A Morphologically Annotated Corpus of Gulf Arabic.

  • CODA Correction Corpus - a spelling correction corpus for multiple Arabic dialects.

  • The Margarita Dialogue Corpus - a corpus of dialogues paired with a database of videos. 

  • SAMER Readability Lexicon - A 26,000 lemma resource labelled for readability level

Talks and Media Coverage

Selected Invited Talks and Tutorials Available Online
Media Coverage
Other Honors