Research

Areas

The following are the main research areas in CAMeL Lab:

Arabic Natural Language Processing

Including research in morphological analysis and disambiguation, syntactic analysis, sentiment analysis, and dialectal processing.

Machine Translation

A specific focus on translation for low resource languages, languages with rich morphologies, and hybrid approaches to machine translation.

Information Retrieval

With a specific focus on speech retrieval

Spoken Document Retrieval

Building speech-based search engines for low resource languages.

Other Research in NLP/CL

Project Name Description
SIMMR
Predicting the Structure of Cooking Recipes.
Qusasat
Qusasat or Arabic Snippets is the application that has been developed for the 2016 NYUAD Hackathon for Social Good in the Arab World. This application won both First Place and Audience Choice award.

Research Highlights

Gumar Corpus

Gumar is a morphologically annotated Gulf Arabic (GA) corpus. On its current state, it contains more than 112 million words that spans over 1,200 documents.

Morphological Analysis of Arabic

Arabic automatic processing is challenging for a number of reasons. First, Arabic words are morphologically rich. Second, un-digitized Arabic words are highly ambiguous. This is why morphology and specifically diacritization is vital for applications of Arabic Natural Language Processing.

Qusasat

Qusasat or Arabic Snippets is the application that has been developed for the 2016 NYUAD Hackathon for Social Good in the Arab World. This application won both First Place and Audience Choice award.

SIMMR

Cooking recipes exist in abundance; but due to their unstructured text format, they are hard to study quantitatively beyond treating them as simple bags of words. In this paper, we proposed an ingredient- instruction dependency tree data structure to represent recipes.

Spoken Document Retrieval

The goal of this work is to build speech-based search engines for low resource languages. There are several challenges in building such engines — this project focuses on two: mitigating the verbosity of spoken queries, and utilizing methods of speech processing that do not require a language model.