Project Description
Corpus Collection
A unique genre of written material, that is specifically known to GA, is online anonymous publicly published long conversational novels. We have found a huge collection of these novels online in one place. We automatically downloaded about 1,200 MS Word documents. Usually, such novels are written in lengthy threads that can be found in online forums. The data we received was collected by volunteering forum members into MS Word documents and then published by another member in an organized matter.
Corpus Genre
The main theme of most of the novels is romantic, it also includes drama and sometimes tragedy. The structure of the novel is simple, it starts with a brief introduction that contains the title of the novel, the writer's pen name (no real names are used) and the country of the novel. The introduction is then followed by a prologue that usually contains a small piece of dialectal poetry or a small piece of literary writing usually in MSA. It also contains a brief description of the novel characters, though some writers prefer to introduce the characters as their role appears. Then comes the main body of the novel, which is often a dialogue between the characters, there is also some pieces of narration between conversations in either the dialect or MSA. The last part of the novel usually has some "moral" lessons narrated by the writer, writers also tend to ask the audience for positive criticism and opinions and whether they should continue writing more novels or not.
The targeted audience is mainly female teenagers, the nature of publishing the novels is highly interactive and dependent on the activity of the audience.
Reseachers
Publications
- Salam Khalifa, Nizar Habash, Fadhl Eryani, Ossama Obeid, Dana Abdulrahim and Meera Al Kaabi: A Morphologically Annotated Corpus of Emirati Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan.
- Salam Khalifa, Sara Hassan, and Nizar Habash: A Morphological Analyzer for Gulf Arabic Verbs. In Proceedings of the WANLP 2017 (co-located with EACL 2017), Valencia, Spain,2017.
- Salam Khalifa, Nizar Habash, Dana Abdulrahim and Sara Hassan: Gumar: A Large Scale Corpus of Gulf Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Portorož, Slovenia, 2016.
Acknowledgments
We wish to thank all the writers of the novels for sharing them publicly, though all are written under pen names. We would also like to thank the Graaam forum members who collected the scattered novels and put them together on MS words files and published them online.
We also thank the Curras members for sharing their web interface code that we built on to produce this website.