Isolated Digit Filipino Speech Recognition through Spectrogram Image Classification: Towards Application in a Disaster Preparedness Participatory Toolkit

I

Click here for the link to the paper

The paper was presented at the 2017 IEEE Region 10 Conference (TENCON 2017)

TENCON 2017 is expected to bring together researchers, educators, students, practitioners, technocrats and policymakers from across academia, government, industry and non-governmental organizations to discuss, share and promote current works and recent accomplishments across all aspects of electrical, electronic and computer engineering, as well as information technology. Distinguished people will be invited to deliver keynote speeches and invited talks on trends and significant advances in the emerging technologies.

Authors:

  • Julie Ann Salido
  • Nathaniel Oco
  • Rachel Edita Roxas
  • Emmanuel Malaay
  • Michael Simora
  • Ronald John Cabatic

Abstract:

In this paper, we present our work on isolated digit speech recognition: by classifying spectrogram images and for use in a disaster preparedness participatory toolkit. To achieve higher inclusivity, we included a voice component for a wider coverage of respondents especially those who have low literacy and those vision impaired individuals. Our methodology is through speech recognition which is a deviation from usual approaches which normally work on acoustic coefficients and features. As our initial test bed, we focused on the Filipino language – a member of the Malayo-Polynesian language family and is the national language in the Philippines. Our data covers 4,297 utterances of the Filipino digits 0 to 9 collected from 262 speakers, and divided the data into 3 parts: 70% for training, 20% for testing, and 10% for validation. We applied short-time Fourier transform on our training data and we used convolution neural networks in MatLab to classify the spectrogram images. The lowest accuracy rate during our tests is 93.02%. Analyses of the results show that background noises are the cause of the misclassified utterances which will further discussed on this paper. While the results are promising, the work can be extended to include closely related languages.

Posts created 35

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top