Noise-resistant Telephone Quality Isolated Digit ASR: Towards Application in a Disaster Participatory Toolkit


This paper was presented at the Proceedings of the 21st Oriental COCOSDA Workshop

COCOSDA, an acronym of the International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, was established in 1991 to promote international cooperation in developing speech corpora and coordinating assessment methods of speech input/output systems. In 1994 it was proposed that a sub-organization for the Oriental community should be established to share linguistic features unique to the region. After a preparatory meeting held by interested members in Hong Kong in 1997, annual meetings have been held since 1998. The community has enjoyed increasing participation from the community and enthusiastic interests to organize future meetings, thus ensuring promising prospect of sustained activities in the future. The purpose of Oriental COCOSDA is to exchange ideas, share information and discuss regional matters on creation, utilization, dissemination of spoken language corpora of oriental languages and also on the assessment methods of speech recognition/synthesis systems as well as to promote speech research on oriental languages.


  • Emmanuel Malaay
  • Ronald John Cabatic
  • Michael Simora
  • Shrestha Mohanty
  • Justin Mi
  • Jonathan Lee
  • Tanatcha Panpairoj
  • Sirej Dua
  • Brandie M. Nonnecke
  • Camille Crittenden
  • Ken Goldberg
  • Nathaniel Oco
  • Rachel Edita Roxas


We present our work on developing an isolated digit Automatic Speech Recognizer (ASR) covering 5 languages spoken in the Philippines: Filipino, Ilocano, Cebuano, English, and Spanish-borrowed cardinal numbers. The ASR recognizes quantitative responses for a disaster participatory toolkit called Malasakit (a Filipino term which means “sincere care”). To make the toolkit inclusive, the ASR was designed to be employed in an Interactive Voice Response (IVR) by integrating it in Twilio, a web service API that can receive calls and connect to the Malasakit database. The speech corpus of the ASR was collected from 296 speakers with a total duration of 8 hours, 53 minutes, and 48 seconds which were decimated from a wideband quality (16 kHz) to a telephone quality (8 kHz). To make the ASR noise-resistant, the telephone quality corpus was contaminated with channel and background noises (eg. busy road, marketplace, construction). For future work, researchers are suggested to use these methods for continuous speech recognition using telephone quality speech corpora which can be used towards analyzing qualitative responses.

Posts created 35

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top