Reconocimiento de patrones de habla usando MFCC y RNA

Recognition of speech pattern using MFCC and ANN

  • Olga L. Ramos
  • Diego A. Rojas
  • Leonardo A. Góngora
Palabras clave: artificial intelligence, MFCC, pattern recognition, speech (en_US)
Palabras clave: habla, inteligencia artificial, MFCC, reconocimiento de patrones (es_ES)

Resumen (es_ES)

En este trabajo se presentan los resultados del diseño y desarrollo de un algoritmo basado en inteligencia artificial para el reconocimiento de patrones de vocablos del idioma español, utilizando Coeficientes Cepstrales en las Frecuencias de Mel o (MFCC), para representar el habla a través de la percepción auditiva del ser humano. La utilización de MFCC permitió caracterizar las señales de voz teniendo en cuenta el posible ruido presente en el ambiente de grabación, lo cual ayudo a la obtención de patrones comunes entre estas señales cuando presentan alteraciones. Como resultado se obtuvo un reconocimiento superior al 95% de las tres vocales escogidas, en este caso la /a/,/e/,/o/, entre un grupo de 22 muestras por vocal para el entrenamiento y 11 muestras para la validación. Las muestras fueron obtenidas de 11 personas, todas del género masculino.

Resumen (en_US)

In this work the results of the design and development of an algorithm based on artificial intelligence and MFCC for recognizing speech patterns are presented. The using of MFCC allowed to characterize voice signals, having into account the noise in the record environment, which helps with the estimation of common patterns among these signals when presents disturbances. As a main result of this work, a recognizing rate between 93 and 96% for the selected vowels (/a/,/e/,/o/) was achieved. For the training a number of 22 samples were used and others 11 for the validation process. The samples were obtained from 11 test subjects, all of them of male genre.


La descarga de datos todavía no está disponible.


K. E. Watkins, A. P. Strafella, and T. Paus, “Seeing and hearing speech excites the motor system involved in speech production”. Neuropsychologia, vol. 41, no. 8, pp. 989–994, Jan. 2003.

N. Kazanina, C. Phillips, and W. Idsardi, “The influence of meaning on the perception of speech sounds”. Proc. Natl. Acad. Sci. U. S. A., vol. 103, no. 30, pp. 11381–6, Jul. 2006.

A. D. Friederici and S. M. E. Gierhan, “The language network”. Curr. Opin. Neurobiol., vol. 23, no. 2, pp. 250–4, Apr. 2013.

L. R. Rabiner and R. W. Schafer, "Theory and Applications of Digital Speech Processing". 1st ed. Pearson, 2011.

H. Veisi and H. Sameti, “Speech enhancement using hidden Markov models in Mel-frequency domain”. Speech Commun., vol. 55, no. 2, pp. 205–220, Feb. 2013.

Q. Bao Nguyen, T. Thang Vu, and C. Mai Luong, “Improving acoustic model for English ASR System using deep neural network”. in The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF), pp. 25–29, 2015.

L. D. Vignolo, H. L. Rufiner, D. H. Milone, and J. C. Goddard, “Evolutionary cepstral coefficients”. Appl. Soft Comput., vol. 11, no. 4, pp. 3419–3428, Jun. 2011.

L. Muda, M. Begam, and I. Elamvazuthi, « Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques ». Journal of computing, vol 2, Issue 3 Mar. 2010.

J. I. Godino-Llorente and P. Gómez-Vilda, “Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors”. IEEE Trans. Biomed. Eng., vol. 51, no. 2, pp. 380–4, Feb. 2004.

R. Maia, M. Akamine, and M. J. F. Gales, “Complex cepstrum for statistical parametric speech synthesis”. Speech Commun., vol. 55, no. 5, pp. 606–618, Jun. 2013.

H. Hong, Z. Zhao, X. Wang, and Z. Tao, “Detection of Dynamic Structures of Speech Fundamental Frequency in Tonal Languages”. IEEE Signal Process. Lett., vol. 17, no. 10, pp. 843–846, Oct. 2010.

S. Sunny, D. P. S., and K. P. Jacob, “Feature Extraction Methods Based on Linear Predictive Coding and Wavelet Packet Decomposition for Recognizing Spoken Words in Malayalam”. in 2012 International Conference on Advances in Computing and Communications, 2012, pp. 27–30.

J.-D. Wu y B.-F. Lin, “Speaker identification based on the frame linear predictive coding spectrum technique”. Expert Syst. Appl., vol. 36, no. 4, pp. 8056–8063, May 2009.

X.-C. Yuan, C.-M. Pun, and C. L. Philip Chen, “Robust Mel-Frequency Cepstral coefficients feature detection and dual-tree complex wavelet transform for digital audio watermarking”. Inf. Sci. (Ny)., vol. 298, pp. 159–179, Mar. 2015.

M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for MFCC feature extraction”. in 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, 2010

X. Zhou, D. Garcia, R. Duraiswami, C. Espy-Wilson, and S. Shamma, “Linear versus mel frequency cepstral coefficients for speaker recognition”. in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 559–564, 2011

Y. Zhang, C. He, Y. Luo, K. Chen, and W. Xing, “Improved perceptually non-uniform spectral compression for robust speech recognition”. J. China Univ. Posts Telecommun., vol. 20, no. 4, pp. 122–126, Aug. 2013.

D. Jurafsky and J. H. Martin, « Speech and language processing : An introduction to natural language processing », computational linguistics, and speech recognition. 2nd ed. Prentice Hall, 2009.

Cómo citar
Ramos, O., Rojas, D., & Góngora, L. (2016). Reconocimiento de patrones de habla usando MFCC y RNA. Visión Electrónica, 10(1), 5-11.
Publicado: 2016-06-20
Visión Investigadora