Simplification of Arabic Masterpieces for Extensive Reading

The main objective of the SAMER project is to create a standard and tools for the simplification of modern fiction in Arabic to school-age learners. So far, the project has developed a five-level prototypical readability scale. It then produced a curated Arabic readability list from a general-purpose corpus of Arabic (half news and half fiction), scaled-graded it based on frequency of occurrence in the corpus, then had it manually annotated in triplicate by language professionals from three dialectal regions in the Arab world. In the next stage, the project drew on the generated readability list in designing and publishing a 36k-word Readability-leveled Thesaurus for Arabic and building a Simplification Interface platform as an extension to Google Docs. In the last stage, the system was used to simplify fifteen 10k-word texts from Arabic fictional masterpieces to produce a readability graded corpus of modern Arabic fiction.

Intended Timeline

  • build a corpus of Arabic texts used in official school curricula;
  • analyze it for features of text difficulty using computational methods and tools;
  • generate a Graded Reader Scale (GRS) mirroring readability levels in based on school curricula to assist in the simplification process; and
  • attract writing talents from around the Arab World to simplify selected works in Arabic fiction according to our GRS levels and guidelines.
A screenshot of the SAMER Simplification Interface at work, highlighting Arabic words according to their readability.

Contact

Dr. Muhamed O. Al-Khalil
Email
: muhamed.alkhalil@nyu.edu
Phone: +971 2 628-4112

NYUAD Humanities Building (A6), Room #1139
PO Box 129188
Saadiyat Island
Abu Dhabi, United Arab Emirates