University Courses

Natural Language Processing (CS-UH 2216)


The field of natural language processing (NLP), also known as computational linguistics, is interested in the modeling and processing of human ("natural") languages. Examples of some of the advances of NLP include machine translation (as in Google Translate, which translates among 80 languages) and question answering (as in IBM’s Watson system, which won Jeopardy in 2011). This course covers foundational NLP concepts and ideas, such as finite state methods, n-gram modeling, hidden Markov models, part-of-speech tagging, context free grammars, syntactic parsing, and semantic representations. The course will survey a range of NLP applications such as information retrieval, summarization and machine translation. Concepts taught in class will be reinforced in practice by hands-on assignments.

Words (CADT-UH 1011)


Words, words, words. Words are the basic units of language. But how do they help us communicate our thoughts? How are they internally constructed? And how do they come together to form complex meanings? How are words from different languages similar, and how are they different? Do words reflect or shape our thought? Do they expand or constrain our imagination? This interdisciplinary course explores what words are and how we think of them. The course brings together insights and ideas from a number of fields: linguistics, philosophy, psychology, sociology, computer science, history, literature, religion, and visual arts to help answer these questions. Students will read materials from a variety of books and articles and discuss them in class, and they will engage in solving and creating language puzzles. Students will learn how to analyze words in terms of their form, function, and meaning in context. Term projects can range from collection and analysis of linguistic data to multidisciplinary artistic creations.


Habash, Nizar. Introduction to Arabic Natural Language Processing, Synthesis Lectures on Human Language Technologies, Graeme Hirst, editor. Morgan & Claypool Publishers. 187 pages, 2010. (PDF version from Publisher) (Amazon)


This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language. The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing. The book discusses Arabic script, phonology, orthography, morphology, syntax, and semantics, with a final chapter on machine translation issues.

The chapter sizes correspond more or less to what is linguistically distinctive about Arabic, with morphology getting the lion's share, followed by Arabic script. No previous knowledge of Arabic is needed. This book is designed for computer scientists and linguists alike. The focus of the book is on Modern Standard Arabic; however, notes on practical issues related to Arabic dialects and languages written in the Arabic script are presented in different chapters.

Table of Contents

  • What is "Arabic?"
  • Arabic Script
  • Arabic Phonology and Orthography
  • Arabic Morphology
  • Computational Morphology Tasks
  • Arabic Syntax
  • A Note on Arabic Semantics
  • A Note on Arabic and Machine Translation

This book was translated to Arabic by Professor Hend Al-Khalifa of King Saud University in Riyadh, Saudi Arabia (2014). (King Saud University Press)


Natural Language Processing of Arabic and its Dialects
Mona Diab and Nizar Habash
EMNLP 2014, NAACL 2012, MEDAR 2009, LREC 2008, NAACL 2007, AMTA 2006
(EMNLP 2014 Video and Slides)

Winter School on Arabic Language Processing
Princess Sumaya University for Technology,
January 27-29, 2014
Nizar Habash and Owen Rambow

Introduction to Arabic Natural Language Processing
Nizar Habash
MEDAR 2009, LREC 2006, ACL 2005, AMTA 2004
(MEDAR 2009 Slides)
(Old videos from a version given at Johns Hopkins University Summer Workshop in 2005: Part 1, Part 2, and Part 3.)

Arabic Natural Language Processing for Machine Translation
Nizar Habash
AMTA 2012, AMTA 2008
(AMTA 2012 Slides)