Including research in morphological analysis and disambiguation, syntactic analysis, sentiment analysis, and dialectal processing.
- Campus Life
A specific focus on translation for low resource languages, languages with rich morphologies, and hybrid approaches to machine translation.
Gumar is a morphologically annotated Gulf Arabic (GA) corpus. On its current state, it contains more than 112 million words that spans over 1,200 documents.
Arabic automatic processing is challenging for a number of reasons. First, Arabic words are morphologically rich. Second, un-digitized Arabic words are highly ambiguous. This is why morphology and specifically diacritization is vital for applications of Arabic Natural Language Processing.
Qusasat or Arabic Snippets is the application that has been developed for the 2016 NYUAD Hackathon for Social Good in the Arab World. This application won both First Place and Audience Choice award.
Cooking recipes exist in abundance; but due to their unstructured text format, they are hard to study quantitatively beyond treating them as simple bags of words. In this paper, we proposed an ingredient- instruction dependency tree data structure to represent recipes.
The goal of this work is to build speech-based search engines for low resource languages. There are several challenges in building such engines — this project focuses on two: mitigating the verbosity of spoken queries, and utilizing methods of speech processing that do not require a language model.