Revisión de las Tecnologías y Aplicaciones del Habla Sub-vocal
Technologies and Applications Review of Subvocal Speech
Este trabajo presenta una revisión de estado de las principales temáticas aplicativas y metodológicas del habla sub-vocal que se han venido desarrollando en los últimos años. La primera sección hace una honda revisión de los métodos de detección del lenguaje silencioso. En la segunda parte se evalúan las tecnologías implementadas en los últimos años, seguido de un análisis en las principales aplicaciones de este tipo de lenguaje y finalmente presentado una amplia comparación entre los trabajos que se han hecho en industria y academia utilizando este tipo de desarrollos.
This paper presents a review of the main applicative and methodological approaches that have been developed in recent years for sub-vocal speech or silent language. The sub-vocal speech can be defined as the identification and characterization of bioelectric signals that control the vocal tract, when is not produced sound production by the caller. The first section makes a deep review of methods for detecting silent language. In the second part are evaluated the technologies implemented in recent years, followed by a review of the main applications of this type of speech and finally present a broad comparison between jobs that have been developed in industry and academic applications.
J. Freitas, et al., "Towards a Multimodal Silent Speech Interface for European Portuguese," in Speech Technologies, I. Ipsic, Ed., ed Rijeka, Croatia: InTech, 2011, pp. 125-150.
B. Denby, et al., "Silent speech interfaces," Speech Communication, vol. 52, pp. 270-287, 2010.
E. Lopez-Larraz, et al., "Syllable-Based Speech Recognition Using EMG," in Annual International Conference of the IEEE Engineering in Medicine and Biology Society, United States, 2010, pp. 4699-4702.
L. E. Larraz, et al., "Diseño de un sistema de reconocimiento del habla mediante electromiografía," in XXVII Congreso Anual de la Sociedad Española de Ingeniería Biomédica, Cádiz, España, 2009, pp. 601-604.
A. A. T. García, et al., "Hacia la clasificación de habla no pronunciada mediante electroencefalogramas (EEG)," presented at the XXXIV Congreso Nacional de Ingeniería Biomédica, Ixtapa-Zihuatanejo, Guerrero, Mexico, 2011.
M. Wester, "Unspoken Speech," Diplomarbeit, Inst. Theoretical computer science, Universität Karlsruhe (TH),, Karlsruhe, Germany, 2006.
P. Xiaomei, et al., "Silent Communication: Toward Using Brain Signals," Pulse, IEEE, vol. 3, pp. 43-46, 2012.
A. Porbadnigk, et al., "EEG-based Speech Recognition - Impact of Temporal Effects," in BIOSIGNALS, 2009, pp. 376-381.
C. S. DaSalla, et al., "Single-trial classification of vowel speech imagery using common spatial patterns," Neural Networks, vol. 22, pp. 1334-1339, 2009.
K. Brigham and B. V. K. Vijaya Kumar, "Imagined Speech Classification with EEG Signals for Silent Communication: A Preliminary Investigation into Synthetic Telepathy," in Bioinformatics and Biomedical Engineering (iCBBE), 2010 4th International Conference on, 2010, pp. 1-4.
D. Suh-Yeon and L. Soo-Young, "Understanding human implicit intention based on frontal electroencephalography (EEG)," in Neural Networks (IJCNN), The 2012 International Joint Conference on, 2012, pp. 1-5.
J. S. Brumberg, et al., "Brain-computer interfaces for speech communication," Speech Communication, vol. 52, pp. 367-379, 2010.
T. Schultz and M. Wand, "Modeling coarticulation in EMG-based continuous speech recognition," Speech Communication, vol. 52, pp. 341-353, 2010.
J. A. G. Mendes, et al., "Subvocal Speech Recognition Based on EMG Signal Using Independent Component Analysis and Neural Network MLP," in Image and Signal Processing, 2008. CISP '08. Congress on, 2008, pp. 221-224.
L. Maier-Hein, "Speech Recognition Using Surface Electromyography," Diplomarbeit, Universit¨at Karlsruhe, Karlsruhe, 2005.
G. S. Meltzner, et al., "Signal acquisition and processing techniques for sEMG based silent speech recognition," in Engineering in Medicine and Biology Society,EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 4848-4851.
J. Freitas, et al., "Towards a silent speech interface for portuguese surface electromyography and the nasality challenge," in International Conference on Bio-inspired Systems and Signal Processing, BIOSIGNALS 2012, February 1, 2012 - February 4, 2012, Vilamoura, Algarve, Portugal, 2012, pp. 91-100.
T. Schultz, "ICCHP keynote: Recognizing silent and weak speech based on electromyography," in 12th International Conference on Computers Helping People with Special Needs, ICCHP 2010, July 14, 2010 - July 16, 2010, Vienna, Austria, 2010, pp. 595-604.
M. Janke, et al., "A Spectral Mapping Method for EMG-based Recognition of Silent Speech," in B-Interface 2010 - Proceedings of the 1st International Workshop on Bio-inspired Human-Machine Interfaces and Healthcare Applications, Valencia, Spain, 2010, pp. 22-31.
B. Denby, et al., "Prospects for a Silent Speech Interface using Ultrasound Imaging," in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, 2006, pp. I-I.
B. Denby and M. Stone, "Speech synthesis from real time ultrasound images of the tongue," in Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on, 2004, pp. I-685-8 vol.1.
T. Hueber, et al., "Continuous-Speech Phone Recognition from Ultrasound and Optical Images of the Tongue and Lips," in International Speech Communication Association INTERSPEECH 2007, 8th Annual Conference of the, Antwerp, Belgium, 2007, pp. 658-661.
T. Hueber, et al., "Phone Recognition from Ultrasound and Optical Video Sequences for a Silent Speech Interface," in International Speech Communication Association INTERSPEECH 2008, 9th Annual Conference of the, Brisbane. Australia, 2008, pp. 2032-2035.
T. Hueber, et al., "Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips," Speech Communication, vol. 52, pp. 288-300, 2010.
J. M. Gilbert, et al., "Isolated word recognition of silent speech using magnetic implants and sensors," Medical Engineering & Physics, vol. 32, pp. 1189-1197, 2010.
R. Hofe, et al., "Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing," Speech Communication, 2012.
T. Toda, "Statistical approaches to enhancement of body-conducted speech detected with non-audible murmur microphone," in Complex Medical Engineering (CME), 2012 ICME International Conference on, 2012, pp. 623-628.
D. Babani, et al., "Acoustic model training for non-audible murmur recognition using transformed normal speech data," in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 5224-5227.
Y. Nakajima, et al., "Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin," in Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, 2003, pp. V-708-11 vol.5.
Y. Nakajima, et al., "Non-Audible Murmur (NAM) Recognition," IEICE - Trans. Inf. Syst., vol. E89-D, pp. 1-4, 2006.
V.-A. Tran, et al., "Improvement to a NAM-captured whisper-to-speech system," Speech Communication, vol. 52, pp. 314-326, 2010.
S. Shimizu, et al., "Frequency characteristics of several non-audible murmur (NAM) microphones," Acoustical Science and Technology, vol. 30, pp. 139-142, 2009.
T. Hirahara, et al., "Silent-speech enhancement using body-conducted vocal-tract resonance signals," Speech Communication, vol. 52, pp. 301-313, 2010.
M. Otani, et al., "Numerical simulation of transfer and attenuation characteristics of soft-tissue conducted sound originating from vocal tract," Applied Acoustics, vol. 70, pp. 469-472, 2009.
P. Heracleous, et al., "Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, pp. 1528-1538, 2010.
S. Ishii, et al., "Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing," in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on, 2011, pp. 494-499.
V. Florescu, et al., "Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface," presented at the IEEE 18th Signal Processing and Commubications Applications Conference, Diyarbakir, Turkey, 2010.
L. Sheng, et al., "Nonacoustic Sensor Speech Enhancement Based on Wavelet Packet Entropy," in Computer Science and Information Engineering, 2009 WRI World Congress on, 2009, pp. 447-450.
T. F. Quatieri, et al., "Exploiting nonacoustic sensors for speech encoding," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, pp. 533-544, 2006.
L. Xiaoxia, et al., "Characteristics of stimulus artifacts in EEG recordings induced by electrical stimulation of cochlear implants," in Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on, 2010, pp. 799-803.
D. Smith and D. Burnham, "Faciliation of Mandarin tone perception by visual speech in clear and degraded audio: Implications for cochlear implants," The Journal of the Acoustical Society of America, vol. 131, pp. 1480-1489, 2012.
F. Lin, et al., "Cochlear Implantation in Older Adults," Medicine, vol. 91, pp. 229-241, 2012.
W. Keng Hoong, et al., "An Articulatory Silicon Vocal Tract for Speech and Hearing Prostheses," Biomedical Circuits and Systems, IEEE Transactions on, vol. 5, pp. 339-346, 2011.
W. Keng Hoong, et al., "An Articulatory Speech-Prosthesis System," in Body Sensor Networks (BSN), 2010 International Conference on, 2010, pp. 133-138.
W. Keng Hoong, et al., "A speech locked loop for cochlear implants and speech prostheses," in Applied Sciences in Biomedical and Communication Technologies (ISABEL), 2010 3rd International Symposium on, 2010, pp. 1-2.
M. Wand and T. Schultz, "Analysis of phone confusion in EMG-based speech recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 757-760.
M. J. Fagan, et al., "Development of a (silent) speech recognition system for patients following laryngectomy," Medical Engineering & Physics, vol. 30, pp. 419-425, 2008.
N. Alves, et al., "A novel integrated mechanomyogram-vocalization access solution," Medical Engineering and Physics, vol. 32, pp. 940-944, 2010.
H. Doi, et al., "An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques," in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 5136-5139.
J. S. Brumberg, et al., "Artificial speech synthesizer control by brain-computer interface," in International Speech Communication Association INTERSPEECH 2009, 10th Annual Conference of the, Brighton, United Kingdom, 2009.
J. R. Wolpaw, et al., "Brain-computer interface technology: a review of the first international meeting," Rehabilitation Engineering, IEEE Transactions on, vol. 8, pp. 164-173, 2000.
W. Jun, et al., "Sentence recognition from articulatory movements for silent speech interfaces," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 4985-4988.
M. Braukus and J. Bluck. (2004, March 17). NASA Develops System To Computerize Silent, "Subvocal Speech". Available: http://www.nasa.gov/home/hqnews/2004/mar/HQ_04093_subvocal_speech.html
I. Ishii, et al., "Real-time laryngoscopic measurements of vocal-fold vibrations," in Engineering in Medicine and Biology Society,EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 6623-6626.
A. Carullo, et al., "A portable analyzer for vocal signal monitoring," in Instrumentation and Measurement Technology Conference (I2MTC), 2012 IEEE International, 2012, pp. 2206-2211.
M. Janke, et al., "Further investigations on EMG-to-speech conversion," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 365-368.
T. Hueber, et al., "Acquisition of Ultrasound, Video and Acoustic Speech Data for a Silent-Speech Interface Application," presented at the 8th International Seminar on Speech Production, Strasbourg, France, 2008.
H. Gamper, et al., "Speaker Tracking for Teleconferencing via Binaural Headset Microphones," Acoustic Signal Enhancement; Proceedings of IWAENC 2012; International Workshop on, pp. 1-4, 2012.
K. A. Yuksel, et al., "Designing mobile phones using silent speech input and auditory feedback," in 13th International Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2011, August 30, 2011 - September 2, 2011, Stockholm, Sweden, 2011, pp. 711-713.
T. Toda, et al., "Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, pp. 2505-2517, 2012.
A partir de la edición del V23N3 del año 2018 hacia adelante, se cambia la Licencia Creative Commons “Atribución—No Comercial – Sin Obra Derivada” a la siguiente:
Atribución - No Comercial – Compartir igual: esta licencia permite a otros distribuir, remezclar, retocar, y crear a partir de tu obra de modo no comercial, siempre y cuando te den crédito y licencien sus nuevas creaciones bajo las mismas condiciones.