Spoken Document Retrieval
The goal of this work is to build speech-based search engines for low resource languages. There are several challenges in building such engines — this project focuses on two: mitigating the verbosity of spoken queries, and utilizing methods of speech processing that do not require a language model.
We find that spoken queries tend to be much longer than their written counterparts. This is both because of the nature of speech — think of the last time you asked something of a customer service agent — and because of the nature of our users. Many of our subjects have not been trained by Google to use one or two-word queries. Our goal is thus to develop systems which can distill a user's information need as the query proceeds.