info@biomedres.us   +1 (720) 414-3554
  One Westbrook Corporate Center, Suite 300, Westchester, IL 60154, USA

Biomedical Journal of Scientific & Technical Research

February, 2020, Volume 27, 2, pp 20625-20627

Mini Review

Mini Review

Silent Speech Recognition for BCI - A Review

You Wang1, Ming Zhang1, Ruifen Hu1, Nan Li2 and Guang Li1*

Author Affiliations

1State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, China

2Department of Engineering, University of Cambridge, UK

Received: April 16, 2020 | Published: April 27, 2020

Corresponding author: Guang Li, State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou 310027, Zhejiang, China

DOI: 10.26717/BJSTR.2020.27.004478

Abstract

Silent Speech Interface (SSI) is just a novel member in Brain-Computer Interface (BCI) family, which relies on decoding the speech-related bio signal activities or articulator motions. It makes recognition or synthesis on data from articulators collected using a variety of sensors. Research on SSI is now an active inter-disciplinal field, combining neuroscience, computer science and engineering. This review presents current the advances and critical issues in the development of SSI. Relevant methods, practices and challenges are also included.

Keywords: Silent Speech Interface; Bio signal Activities; Articulator Motions

Abbreviations: SSI: Silent Speech Interface; BCI: Brain-Computer Interface; EMG: Electromyography; HMM: Hidden Markov Model; DTW: Dynamic Time Warping; LDA: Linear Discriminant Analysis

Introduction

Acoustic speech is the most common and comfortable communication method among humans. Starting from the brain, acoustic signals are produced by the synergistic work of vocal organs such as lung, larynx and throat. Bio signals and articulatory activities are along with these physiological processes and have turned out to be potential to interpret the speech production [1]. One of the typical methods is to make use of such speech-related bio signal or articulatory motion as an alternative way for speech recognition. Differing from conventional Brain-Computer Interface (BCI) which directly decoding cortical brain activity, the Silent Speech Interface (SSI) always uses the neuromuscular or articulator activities to indirectly trace back to the neural information [2,3]. As a relatively novel way of BCI, SSI has advantages in implementation and application. Compared with conventional BCI, this technology requires fewer channel signals and the signal detection is much more convenient due to data usually recorded on muscle surface or via a non-contact way [4]. SSI has unique advantages in comparison with speech interface as well. It mostly only depends on the relevant electrophysiology activities and will not be interfered by ambient noises, so it works well in noisy condition. Secondly, as SSI does not need to be emitted voice, so privacy of communication can be guaranteed, and will not interact with surrounding people, especially in public area. Finally, SSI is an excellent choice to help people with speech disabilities (i.e., laryngectomy or dumb patients) or to language training [1,5,6]. Figure 1 shows the overall diagram of SSI, from data acquisition to the final applications.

Figure 1: Silent speech interface diagram.

Data Acquisition

Some techniques have been employed to capture different speech-related bio signals, including articulatory and muscular activity. Articulatory movements (i.e., lip or tongue) usually come with vocal or whisper speech [7]. The popular one is relevant with neuromuscular activities, regardless of vocal or silent speech [5]. Measuring the motion of articulators, imaging and surface sensing are two common methods used in SSI. Video and ultrasound imaging can record the activities of visible or invisible speech articulators straightforward [8]. Because of its good safety and temporal resolution, imaging is well adopted to vocal tract and analysis under certain clinical conditions. Vibration or magnetic sensors used to capture the articulatory activity cause less signal crosstalk in multichannel systems and require less treatment on skin surface [7-9]. However, moving articulators do not work well in purely silent speech.

Both invasive and noninvasive ways have been used to record muscular activity, which is usually called Electromyography (EMG) [1]. Needle electrodes are inserted into muscle tissues and obtains high quality EMG with excellent spatial and temporal resolution invasively. So the medical expertise is required, and it is unsuitable for frequent use. Noninvasive EMG sensors are more popular for SSI. In accessible areas, always face and around, surface electrodes are placed on particular muscles or some designated grids to obtain surface EMG (sEMG) data [10-12]. Due to the skin and tissue between the electrodes and muscles, the signals measured are actually represented the signal mixed from several muscles. Although the signal quality is relatively poor, it is still a preferred implementation because of the convenience and hygiene in use.

Methods

There are only a few research on SSI based on the articulator motions. Ultrasonic and optical images are processed to obtain feature sets and then the speech information are recognized via silent vocoders [8,13-14]. Flexible and skin-attachable vibration sensor can perceive human voices quantitatively by the examined linear relationship between voice pressure and neck skin vibration [9]. For magnetic technique, permanent magnets attached to speech articulators measure the magnetic field for achieving speech recognition by Hidden Markov Model (HMM), Dynamic Time Warping (DTW) or other algorithms [15-17]. In EMG-based SSI, syllables, phonemes and spectrum are used to make recognitions under time, frequency and time-frequency domain. Initially, Linear Discriminant Analysis (LDA) and HMM are utilized as classifiers to make recognition [1,18]. Nowadays the approaches are more related to artificial neural networks that perform well in related studies [19].

Output and Potential Applications

SSI can output in two forms, silent speech recognition in text code and synthetic speech in voice [6]. It is up to the practical requirements. Potentially SSI can be implement in following circumstances:
a) Medical prostheses control and speech training by patients with speech disabilities
b) Hands-free peripheral device control
c) Communication in privacy or noisy ambience [20-24].

Open Challenges

Substantial progress has been achieved in recent years, especially in data acquisition approaches and algorithms. However, there are still some common challenges. Just as conventional speech recognition, large vocabulary datasets with high performance are required. Subject or speaker independence is another concern. The recognition is closely relevant with speaker’s anatomy, so the differences of muscular movement of different individuals and sensor position of different trials may influence the accuracy of SSI. To practice in reality, conveniently wearable recording systems are required to work robustly.

Conclusion

In this paper, an overview of silent speech interface, a new proposed promising technology, is presented. Brief introduction to signal obtain approaches, recognition methods and challenges of SSI are illustrated. Currently, the accuracy of SSI is achieved more than 90% with reasonable sEMG data size. The noninvasive and convenient way is a promising method for BCI.

Acknowledgement

This work is supported by Zhejiang University Education Foundation Global Partnership Fund.

Conflict of interest

The authors declare that they have no competing interests.

References

Mini Review

Silent Speech Recognition for BCI - A Review

You Wang1, Ming Zhang1, Ruifen Hu1, Nan Li2 and Guang Li1*

Author Affiliations

1State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, China

2Department of Engineering, University of Cambridge, UK

Received: April 16, 2020 | Published: April 27, 2020

Corresponding author: Guang Li, State Key Laboratory of Industrial Control Technology, Institute of Cyber Systems and Control, Zhejiang University, Hangzhou 310027, Zhejiang, China

DOI: 10.26717/BJSTR.2020.27.004478

Abstract

Silent Speech Interface (SSI) is just a novel member in Brain-Computer Interface (BCI) family, which relies on decoding the speech-related bio signal activities or articulator motions. It makes recognition or synthesis on data from articulators collected using a variety of sensors. Research on SSI is now an active inter-disciplinal field, combining neuroscience, computer science and engineering. This review presents current the advances and critical issues in the development of SSI. Relevant methods, practices and challenges are also included.

Keywords: Silent Speech Interface; Bio signal Activities; Articulator Motions

Abbreviations: SSI: Silent Speech Interface; BCI: Brain-Computer Interface; EMG: Electromyography; HMM: Hidden Markov Model; DTW: Dynamic Time Warping; LDA: Linear Discriminant Analysis