Successful Research Year for CAMeL

2019 Lab Achievements

A successful research year for the Computational Approaches to Modeling Language Lab (CAMeL) at NYUAD

Researchers and students in the Computational Approaches to Modeling Language lab (CAMeL) at New York University Abu Dhabi have published 14 papers and released four resources in 2019 in the field of natural language processing.  Some of the papers were presented at the following conferences: NAACL 2019 (Minneapolis, USA), ACL 2019 (Florence, Italy), MT Summit 2019 (Dublin, Ireland), Interspeech 2019 (Graz, Austria), and EMNLP 2019 (Hong Kong, China).  Some of these efforts were in collaboration with researchers from other institutions including American University of Beirut, Carnegie Mellon University Qatar, Columbia University, Element AI, Google, Ohio State University, and the Qatar Computing Research Institute.

The CAMeL Lab research areas include developing new artificial intelligence algorithms for language processing, creating resources and tools to support research in computational linguistics, as well as creating new annotation standards and guidelines with a focus on the Arabic language and its dialects.

Publications by Theme

Computational Morphology
  • Adversarial Multitask Learning for Joint Multi-Feature and Multi-Dialect Morphological Modeling by Nasser Zalmout and Nizar Habash. (ACL 2019).
  • A Little Linguistics Goes a Long Way:Unsupervised Segmentation with Limited Language Specific Guidance by Alexander Erdmann, Salam Khalifa, Mai Oudah, Houda Bouamor and Nizar Habash. (SIGMORPHON 2019, co-located with ACL).
  • Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan by Faisal Alshargi, Shahd Dibas, Sakhar Alkhereyf, Reem Faraj, Basmah Abdulkareem, Sane Yagi, Ouafaa Kacha, Nizar Habash and Owen Rambow. (WANLP 2019, co-located with ACL).
Dialect Identification
  • The MADAR Shared Task on Arabic Fine-Grained Dialect Identification by Houda Bouamor, Sabit Hassan, and Nizar Habash. (WANLP 2019, co-located with ACL).
  • ADIDA: Automatic Dialect Identification for Arabic by Ossama Obeid, Mohammad Salameh, Houda Bouamor, and Nizar Habash. (NAACL 2019).
Information Extraction
  • Unsupervised Neologism Normalization Using Embedding Space Mapping by Nasser Zalmout, Kapil Thadani, and Aasish Pappu. (W-NUT 2019, co-located with EMNLP).
  • The Effectiveness of Simple Hybrid Systems for Hypernym Discovery by William Held and Nizar Habash. (ACL 2019).
  • Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities by Alexander Erdmann, David Joseph Wrisley, Benjamin Allen, Christopher Brown, Sophie Cohen-Bodénès, Micha Elsner, Yukun Feng, Brian Joseph, Béatrice Joyeux-Prunel, and Marie-Catherine de Marneffe. (NAACL 2019).
Machine Translation
  • The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation by Mai Oudah, Amjad Almahairi, and Nizar Habash. (MT Summit 2019).
Sentiment Analysis

Speech Evaluation

  • Towards Variability Resistant Dialectal Speech Evaluation by Ahmed Ali, Salam Khalifa, and Nizar Habash. (Interspeech 2019).

Gender Bias in AI

  • Automatic Gender Identification and Reinflection in Arabic by Nizar Habash, Houda Bouamor, and Christine Chung. (GEBNLP 2019, co-located with ACL).


  • The Margarita Dialogue Corpus - A collection of out-of-context and in-context question-answer pairs for developing time-offset interaction dialogue systems.