Acoustic and Language Modeling for Speech Recognition of a Spanish Dialect from the Cucuta Colombian Region

Juan David Celis Nuñez, Rodrigo Andres Llanos Castro, Byron Medina Delgado, Sergio Basilio Sepúlveda Mora, Sergio Alexander Castro Casadiego

Abstract


 

Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.

Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.

Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.

Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.


Keywords


speech recognition, acoustic models, language models, CMU Sphinx, Raspberry Pi.

References


F. Moumtadi, F. Granados-Lovera, J. Delgado-Hernández, “Activación de funciones en edificios inteligentes utilizando coman-dos de voz desde dispositivos móviles”. Ingeniería. Investigación y Tecnología, abril-junio 2014, pp.175-186. Disponible en: http://www.revele.com.veywww.redalyc.org/articulo.oa?id=40430749002. Consultado por última vez el 16 de agosto del 2017.

J.M. Alcubierre, J. Minguez, L. Montesano, L. Montano, O. Saz, E. Lleida,” Silla de Ruedas Inteligente Controlada por Voz”. Primer Congreso Internacional de Domótica, Robótica y Teleasistencia para todos, 2005. [Online]. Disponible en https://www.researchgate.net/profile/Javier_Minguez/publication/237524693_Silla_de_Ruedas_Inteligente_Controlada_por_Voz/links/00b4952bfed6f95e49000000.pdf, Consultado por última vez el 16 de agosto del 2017.

M. Y. El Amrani, M.M. H. Rahman, M. R. Wahiddin y A. Shah, “Building CMU Sphinx Language Model for The Holy Quran using Simplified Arabic Phonemes”. Egyptian Informatics Journal, vol. 17, no. 3, November 2016, pp. 305–314.

M. Saqer, “Voice speech recognition using hidden Markov model Sphinx-4 for Arabic”. M.S. thesis, University of Houston-Clear Lake, ProQuest Dissertations Publishing, 2012. [Online]. Disponible en: https://search.proquest.com/docview/1029871476?accountid=43636, consultado por última vez el 16 de agosto del 2017.

U. Uebler, “Multilingual speech recognition in seven languages”. Speech Communication, vol. 35, no. 1–2, August 2001, pp. 53–69.

J. Köhler, “Multilingual phone models for vocabulary-independent speech recognition tasks”. Speech Communication, vol. 35, no. 1–2, August 2001, pp. 21–30.

V. Z. Këpuska, P. Rojanasthien, “Speech Corpus Generation from DVDs of Movies and TV Series”. Journal of International Technology and Information Management, vol. 20, no. 1-2, 2011, pp. 49-82. [Online]. Disponible en https://search.proquest.com/docview/1357567679?accountid=43636, consultado por última vez el 16 de agosto del 2017.

CMU Sphinx Project by Carnegie Mellon University, Open Source Speech Recognition Toolkit. [Online]. Disponible en http://cmusphinx.sourceforge.net/, consultado por última vez el 16 de agosto del 2017.

Y. Wang, X. Zhang, “Realization of Mandarin continuous digits speech recognition system using Sphinx”, 2010 International Symposium on Computer Communication Control and Automation (3CA), 2010. [Online]. Disponible en http://ieeexplore.ieee.org/document/5533801/, consultado por última vez el 16 de agosto del 2017.

A. Ceballos, A. F. Serna-Morales, F. Prieto, J. B. Gómez, T. Redarce, “Sistema audiovisual para reconocimiento de coman-dos”. Ingeniare: Revista Chilena de Ingeniería, vol. 19, no. 2, 2011, pp. 278-291. [Online]. Disponible en https://search.proquest.com/docview/906290348?accountid=43636, consultado por última vez el 16 de agosto del 2017.

A. Ceballos, "Desarrollo de un sistema de manipulación de un robot a través de movimientos de la boca y de comandos de voz". Tesis para optar al grado de Magíster. Universidad Nacional de Colombia, Sede Manizales. Colombia. 2009.

R. Calvo Arias, “Reconocimiento de voz”. Proyecto de Graduación licenciatura en Ingeniería Electrónica, Instituto Tecnológico de Costa Rica. Escuela de Ingeniería Electrónica, 2002, [Online]. Disponible en http://repositoriotec.tec.ac.cr/handle/2238/5652, consultado por última vez el 16 de agosto del 2017.

E. Gamma, D. Amaya Hurtado, O. Sandoval, “Revisión de las tecnologías y aplicaciones del habla sub-vocal”. Ingeniería, vol. 20, no. 2, pp. 277–288. [Online]. Disponible en http://dx.doi.org/10.14483/udistrital.jour.reving.2015.2.a07, consultado por últi-ma vez el 16 de agosto del 2017.

S. Oberle, “Detection and estimation of acoustical signals using hidden Markov model”. Ph.D. dissertation, Eidgenoessische Technische Hochschule Zuerich, Switzerland, ProQuest Dissertations Publishing, 1999. [Online]. Disponible en https://search.proquest.com/docview/304550977?accountid=43636, consultado por última vez el 16 de agosto del 2017.

A. Varela, H. Cuayáhuitl y J. A. Nolazco-Flores, “Creating a Mexican Spanish version of the CMU Sphinx-III speech recognition system”, Progress in Pattern Recognition, Speech and Image Analysis, Springer, 2003, pp. 251–258.

R. Mingov, E. Zdravevski y P. Lameski, “Application of Russian Language Phonemics to Generate Macedonian Speech Recognition Model Using Sphinx”, ICT Innovations 2016, September 2016. [Online]. Disponible en https://www.researchgate.net/publication/308626983_Application_of_Russian_Language_Phonemics_to_Generate_Macedonian_Speech_Recognition_Model_Using_Sphinx, consultado por última vez el 16 de agosto del 2017.

P. Lamere, P. Kwok, E. B. Gouv, R. Singh, W. Walker, y P. Wolf, The CMU sphinx-4 speech recognition system, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Hong Kong, 2003. [Online]. Disponible en http://mlsp.cs.cmu.edu/people/rsingh/papers_old/icassp03-sphinx4_2.pdf, consultado por última vez el 16 de agosto del 2017.

M. Raab, R. Gruhn y E. Noeth, “A scalable architecture for multilingual speech recognition on embedded devices”. Speech Communication, vol. 53, no. 1, January 2011, pp. 62-74.

L. Villaseñor, M. Montes, M. Pérez, D. Vaufreydaz, Comparación léxica de corpus para generación de modelos de lenguaje, IBERAMIA workshop on Multilingual Information Access and Natural Language, 2002. [Online]. Disponible en http://hal.inria.fr/docs/00/32/64/02/PDF/Villasenor02a.pdf, consultado por última vez el 16 de agosto del 2017.




DOI: https://doi.org/10.14483/23448393.11616

Creative Commons License

Attribution-NonCommercial-NoDerivatives

Facultad de Ingeniería

Universidad Distrital Francisco José de Caldas

ISSN 0121-750X   E-ISSN 2344-8393

https://doi.org/10.14483/issn.2344-8393