Mobile QR Code QR CODE

  1. (Department of Electronics and Communication, SRM Institute of Science and Technology, Chennai, India {joshuaj@srmist.edu.in} )
  2. (Department of Electronics and Communication, SRM Institute of Science and Technology, Chennai, India {vijayakp@srmist.edu.in} )
  3. (Belgrade Metropolitan University, Serbia jovana.jovicic@metropolitan.ac.rs)
  4. (University of Nis, Serbia miroslav.trajanovic@gmail.com )



AWS lambda, Brain-computer interface, Cloud computing, CNN, EEG signal, IoT, Imagined speech to text

1. Introduction

The brain-computer interface (BCI) is cutting-edge technology that helps physically challenged people interact with the world. Reading and analyzing various properties of the brain, such as electrical, magnetic, and blood oxygen levels, help better understand the brain activities, such as alertness, focus level, sleep cycle, and even motor control signals [1]. These motor control signals are involved in actuating muscles throughout the body. Many studies have shown that by extracting and analyzing the signals associated with speech, articulation will help identify the word that is being imagined in the brain [2-4].

This field of research is still in its infancy because only six words have been identified [5]. Increasing the number of words results in a decrease in accuracy. Moreover, the duration of the imagined word varies from time to time and from person to person. Another challenge in EEG signal processing is tagging the signal with the actual word. Unlike an audio signal, it is not that easy to tag because the imagined word of the participant is not understandable.

Moreover, there is a considerable gap between the stimuli and response. The response does not have an indicator or marker to specify where the cue of the stimulus starts and where it ends. One more challenge involved in developing a system to convert an imagined word in the brain to text or sound is the computational complexity of the algorithms involved in processing and classifying the EEG signals to text. Usually, a handheld device only requires the deployment of the application. Handheld devices and portable devices have lesser throughput, making them less suitable for this type of application.

This article contributes to the following areas: (i) introducing a mechanism to synchronize the stimuli and response by adding an indicator channel in the EEG signal to indicate where the imagined speech starts and ends to increase the accuracy of detecting the word; (ii) a method of repeating the thought of the same word with different experiment durations is proposed to improve the data reliability; (iii) a method of segmenting the EEG data is proposed to enhance the accuracy for an increased number of 10 words recognition; (iv) an IoT framework with cloud computing is proposed in this paper to handle the computational complexity.

IoT and cloud computing has become more versatile and provides stable solutions [6]. The few web services available include Amazon web service (AWS), Google cloud platform, Microsoft Azure, IBM Cloud, Jelastic, Digital ocean, and Salesforce. The system in this paper deals with the Amazon web service because it is a reliable, scalable, and inexpensive cloud computing service. The remaining part of the article is organized as follows: Section II summarizes the related works, while section III details the materials and methods followed in this paper. The results are reported in section IV, and section V proposes the opportunities and challenges in future research in BCI for Imagined speech. Section VI discusses the conclusions and summary of work done.

2. Related Work

Many studies have provided evidence for the relationship between the brain activities and their corresponding outcomes as motor actions and speech [7-9]. The technologies involved in BCI include electroencephalography (EEG), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (fNIR), and electrocorticography (ECoG) [10].

fNIR and fMRI technology are used to find the active part of the brain while performing a particular task. The active brain part takes more oxygen, which can be monitored by fNIR. In fNIR, the blood oxygen level was measured through an infrared source and detector placed on the scalp in a particular order. A change in infrared light intensity was converted to the level of oxygenated and deoxygenated hemoglobin [11,12]. The human auditory system was mapped to Broca’s and Wernicke’s area of the brain using this information. On the other hand, these methods lag in temporal resolution. To record the imagined speech, the system requires high temporal resolution. MEG is a non-invasive method that requires a bulk fixed instrument, which makes the experiment procedure expensive and complex.

ECoG has a very high temporal resolution. It has less noise because it is placed inside the skull and on the surface of the brain. Electrode placement requires surgery and may produce scarring on the brain surface. Therefore, this method has more risk in usage [13].

Among these BCI technologies, EEG stands out because of its low cost, low risk, wearable, and excellent temporal resolution [14,15]. The signals were captured during a task performed by the subject. The task in the proposed system was to imagine the word in mind with or without articulating the sound after an audio or visual cue. The maximum number of words processed thus far was only six, which is very limited in daily usage.

The support vector machine (SVM) is a widely used algorithm for EEG signal processing [16,17]. On the other hand, before applying any machine-learning algorithm, the data needs to be noise- and artifact-free. The general pre-processing methods are the bandpass filter and common spatial patterns (CSP) .

All the above-said algorithm requires machines with high processing capacity. An IoT-based cloud computing system can provide a solution to these process-hungry applications. This system also provides a centralized kernel (learning and classification algorithms) that keeps updating for each data input entry. Owing to the centralization of the kernel, learning and classification occur uniformly to all the systems connected to the cloud. One more advantage of the cloud system is scalability, which enables an increase in the amount of data for learning if the classification accuracy is poor.

3. Materials and Methods

3.1 Data Acquisition

The EEG signal was collected from 18 right-handed volunteers aged between 13 and 51. Among them, six are male, and 12 are female. Each volunteer was asked to rest in a comfortable chair in a calm room and asked to repeat imagining any one of the following words ‘Bath,’ ‘Cold,’ ‘Doctor,’ ‘Food,’ ‘Hot,’ ‘No,’ ‘Pain,’ ‘Toilet,’ ‘Water,’ and ‘Yes.’ The volunteer's data for the given the word is acquired for one-, five-, and ten second time intervals to improve the reliability of the data. Each time they imagine a word, the EEG signal was captured using a NeuroSky MindWave Mobile 2 device. This apparatus is a single-channel EEG device with a 12-bit resolution and a 512Hz sampling rate at maximum. This raw data was then transferred to Raspberry Pi 3 through Bluetooth connectivity as packets.

In Raspberry Pi 3, Python code was used to extract the raw EEG data from the received packets. A part of the code was written to remove the outlier values and eye blinking artifacts during pre-processing of the data.

As mentioned in Section I (Introduction), the EEG data requires some synchronizing between stimuli and response. As a solution to this, the participants were asked to imagine the word after each bell sound. Later, this bell sound was added as a second channel to the recorded response by a microphone with the same timestamp. This audio signal was used as a synchronizing pulse for each imagined word.

According to this, a participant repeats a word twice for five seconds and four times for 10 seconds. Therefore, the response with indicator can be segmented and considered different samples. A comparison was made between ‘without indicator’ and ‘with indicator’. Table 1 lists the performance for five seconds.

Here, the sample size for without an indicator is 180 because the complete trial is considered to be one sample. Furthermore, for the remainder of this study, the samples were collected with indicators.

The volunteer's data for the given the word was acquired for one-, five-, and 10-second intervals. The one-second samples were too short for the participant to think the word. Therefore, it was discarded. During five and 10 seconds, the samples were collected with an indicator. The calculation for the total number of samples is as follows:

Sample size = (Number of participants) X

(Number of words) X

(Number of times they repeat in the trial).

That is, 18 ${\times}$ 10 ${\times}$ 2 = 360 is for a five-second time interval and 18 ${\times}$ 10 ${\times}$ 4 =720 for 10-second time interval. Of these samples, 80% were used as training data, and 20% were used for testing and validating. Therefore, 288 samples for training and 72 samples for testing in a 5-second time interval and 576 samples for training, and 144 samples for testing in a 10second time interval.

The Raspberry Pi 3 was used as a local host in which the received raw data was preprocessed to remove artifacts and noise and stored in memory in an Excel sheet format. The Raspberry Pi 3 was also used to communicate with the cloud system and display the final text result received from the cloud system. The algorithm and GUI to communicate and display the messages are written in python language.

An AWS Message Broker was installed in the Raspberry Pi 3, and MQTT Publisher and MQTT subscriber were configured. The MQTT Publisher pushed the pre-processed data into the cloud system, and the MQTT Subscriber received the processed result text from the cloud system. This final text was displayed in the Raspberry Pi 3 display.

Fig. 1. Response with no indicator.
../../Resources/ieie/IEIESPC.2021.10.3.183/fig1.png

Fig. 2. Response with indicator and data after segmentation into separate samples.

../../Resources/ieie/IEIESPC.2021.10.3.183/fig2.png
Table 1. Comparison of accuracy with respect to indicator.

Data acquisition method

Sample size

Accuracy in %

Without indicator

180

58.4%

With indicator

360

77.3%

3.2 Cloud System

The implementation of the proposed system used Amazon web service (AWS) as the cloud system, which executed the code of the machine learning and classification algorithms. The AWS provided various services, such as Amazon Lambda, which is a server-less service, and Amazon Sage Maker, which requires a dedicated server to implement the machine language algorithms. These services also support various operating system platforms and programming languages.

The messages received from the AWS Message Broker were processed and integrated by the Rules engine. This selects the data from message payloads, processes them, forwards them to the AWS Lambda service and Amazon Dynamo. Amazon Dynamo Cloud Database is a multi-master, internet-scale database with build-in security, which can handle more than 20million requests per second.

In AWS Lambda, the data was pre-processed, and the data was prepared ready for the machine-learning algorithm. AWS Lambda also contains services related to machine learning and data classification. AWS Lambda service is implemented along with Amazon Kinesis. Amazon Kinesis is a data streaming service provided by AWS. A convolution neural network (CNN) was used as the classification algorithm. Here, the data was first converted to two-dimension information from one dimension data stream. This two-dimension vector was applied to CNN as an image form.

The input layer was connected to the convolution layer of 20 with a kernel dimension of 5x5. One batch normalization layer was used to speed up training by reducing the sensitivity. ReLu was used as the activation function. Two fully connected layers followed by a softmax layer were employed as a classification layer for computing the probability of the classes. The classification layer contained a one-dimensional array of size ten representing the class.

The final text result of the classifier was forwarded to the Amazon Dynamo Cloud Database and the MQTT subscriber using the Rules engine.

4. Performance Analysis and Result Discussion

4.1 Maximum Latency

Table 2 lists the latency compared between different sample sizes. The latency depended on the throughput of the server and the network capacity.

Table 2. Accuracy and Latency result.

Sample size

Accuracy

Latency in sec

720

82.1%

Trining=1.5; test=0.1283

360

77..3%

Trining=0.7

Test =0.1144

4.2 Different Dataset Size

An increase in dataset size always increases the accuracy of the CNN system. A comparison study was performed by varying the dataset size, and Table 2 lists the accuracy level. From the table, the accuracy increased with increasing dataset size, and the latency also increased.

4.3 Accuracy and Loss

Fig. 4 presents the accuracy of classification versus the number of epochs.

Fig. 4 shows that the accuracy was above 70% for a 140 epoch and small training data set. By increasing the training data size to 720, the accuracy was improved to 82%.

Fig. 5 shows the loss value concerning the number of the epochs. The minimum error/loss value of 8.2311e-08 was obtained after training the CCN at 144$^{\mathrm{th}}$ epoch.

Fig. 3. Block diagram of the proposed IoT based brain signal classifier
../../Resources/ieie/IEIESPC.2021.10.3.183/fig3.png
Fig. 4. Accuracy of the classification with respect to the number of epochs.
../../Resources/ieie/IEIESPC.2021.10.3.183/fig4.png
Fig. 5. Mean Squared Error as a function of the number of the epochs.
../../Resources/ieie/IEIESPC.2021.10.3.183/fig5.png

5. Opportunities and Challenges

The BCI research was still in its infancy because the maximum of imagined words identified was still six. More word classification is the current requirement to know the actual needs of the patient. On the other hand, the challenge is in the data acquisition methods. Considerable electrical interference and even brain signals responsible for other actions also result in poor accuracy. These challenges open the door for more research in the future.

6. Conclusion

Thought-to-text conversion using EEG signal analysis is a challenge. An attempt was made to make ten specific words that may require an expression of the basic needs of paralyzed people. The brain signals corresponding to the text information of these words were mapped with the help of a deep learning algorithm. The imagined speech to text mapping problem was converted to a classification problem under this work. The accuracy result of the mapping provides a hope to realize the proposed system in a practical scenario. Approximately 82% accuracy was achieved using a single-channel EEG signal without feature extraction and signal processing. Future work will attempt to improve the accuracy of the proposed system by applying a signal-processing algorithm before feeding the data to the classifier.

REFERENCES

1 
Pereira Joana, Sburlea Andreea Ioana, Müller-putz Gernot R, September 2018, EEG patterns of self-paced movement imaginations towards externally-cued and internally- selected targets, nature scientificreportsDOI
2 
Sereshkeh Alborz Rezazadeh, Trott Robert, Bricout Aurelien, Chau Tom, December 2017, EEG Classification of Covert Speech Using Regularized Neural Networks, IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 25, pp. 2292-2300DOI
3 
Hickok Gregory, February 2012, Computational neuroanatomy of speech production, Nature reviews-Neuroscience, Vol. 13, pp. 135-145DOI
4 
Rahman1 K. A. A., Ibrahim1 B. S. K. K., Leman A.M., Jamil M. M. A., December 2012, Fundamental study on brain signal for BCI-FES system development, IEEE-EMBS Conference on Biomedical Engineering and Sciences, pp. 195-198DOI
5 
Matsumotoa Mariko, Hori Junichi, November 2013, Classification of silent speech using support vector machine and relevance vector machine, Applied Soft Computing, Vol. 20, pp. 95-102DOI
6 
Xu Xiaolong, Liu Qingxiang, Luo Yun, Peng Kai, Zhang Xuyun, Meng Shunmei, January 2019, A computation offloading method over big data for IoT-enabled cloud-edge computing, Future Generation Computer Systems, Vol. 95, pp. 522-533DOI
7 
Jafferson A. Joshua, JAN 2020, ,A Review on Machine Learning Mechanisms for Imagined Speech Classification, Journal of Advanced Research in Dynamical and Control Systems, Vol. volume 12, No. 1, pp. 137-142DOI
8 
Vijayakumar , abajieet , balaji , February 2019, A Palm Vein Recognition System based on support vector machine, IEIE Transaction on smart processing and computing, Vol. 8, No. 1DOI
9 
Ganga Revanth Chowdary, Vijayakumar P., Badrinath Pratyush, Singh Ankit Raj, Singh Mohit, Drone Control Using EEG Signal, Journal of Advanced Research in Dynamical and Control Systems, Vol. 11, No. 4, pp. 2107-2113URL
10 
Cooney Ciaran, Folli Raffaella, Coyle Damien, October 2018, Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface, iScience Cellpress Reviews, Vol. 8, No. , pp. 103-125DOI
11 
Sereshkeh Alborz Rezazadeh, Yousefi Rozhin, Wong Andrew T, November 2018, Online classification of imagined speech using functional near-infrared spectroscopy signals, Journal of neural engineering, Vol. 16DOI
12 
Valente Giancarlo, Kaas Amanda L., Formisano Elia, Goebel Rainer, 2019, Optimizing fMRI experimental design for MVPA-based BCI control: Combining the strengths of block and event-related designs, NeuroImage, Vol. 186, pp. 369-381DOI
13 
Brumberg Jonathan S., Krusienski Dean J., Chakrabarti Shreya, Gunduz Aysegul, Brunner Peter, Ritaccio Anthony L., Schalk Gerwin, November 2016, Spatio-temporal progression of cortical activity related to continuous overt and covert speech production in a reading task, PLoS ONE, Vol. 11, pp. 1-21DOI
14 
Jahangiri1 Amir, Sepulveda Francisco, 2019, The Relative Contribution of High-Gamma Linguistic Processing Stages of Word Production, and Motor Imagery of Articulation in Class Separability of Covert Speech Tasks in EEG Data, Journal of Medical Systems, Vol. 43DOI
15 
Wang Li, Zhang Xiong, Zhong Xuefei, Zhang Yu, 2013, Analysis and classification of speech imagery EEG for BCI, Biomedical Signal Processing and Control, Vol. 8, pp. 901-908DOI
16 
Siuly Siuly , Li Yan, Zhang Yanchun, 2016, EEG Signal Analysis and Classification Techniques and Applications, Health Information ScienceDOI
17 
Martin Stephanie, Brunner Peter, Iturrate1 Iñaki, Millán1 José del R., Schalk Gerwin, Knight Robert T., Pasley Brian N., May 2016, Word pair classification during imagined speech using direct brain recordings, Nature Scientific Reports, Vol. 6DOI

Author

A. Joshua Jafferson
../../Resources/ieie/IEIESPC.2021.10.3.183/au1.png

A. Joshua Jafferson is currently working as an Assistant Professor in Department of Electronics and Communication Engineering at SRM Institute of Science and Technology, Chennai, Tamil Nadu, India. He is currently pursuing PhD in Bio-medical signal processing domine under the guidance of Dr. P. Vijayakumar. Earlier he did his Masters in Embedded System from SASTRA University (2008), Thanjavur, Tamil Nadu, India.

Vijayakumar Ponnusamy
../../Resources/ieie/IEIESPC.2021.10.3.183/au2.png

Vijayakumar Ponnusamy has completed his Ph.D. from SRM IST (2018) in applied machine learning in wireless communication (cognitive radio), Master in Applied Electronic from the college of engineering, Guindy (2006), and B.E(ECE) from Madras University (2000). He is a Certified ``IoT specialist'' and ``Data scientist. ``. He is a recipient of the NI India Academic award for excellence in research (2015). His current research interests are in Machine and Deep learning, IoT based intelligent system design, Blockchain technology, and cognitive radio networks. He is a senior member of IEEE. He is currently working as an Associate Professor in the ECE Department, SRM IST, Chennai, Tamil Nadu, and India.

Jovana Jović
../../Resources/ieie/IEIESPC.2021.10.3.183/au3.png

Jovana Jović, MSc, is a junior research assistant and a teaching assistant at the Faculty of Information Technologies at the Belgrade Metropolitan University. At the same University she is a Ph.D. student in Software Engineering. She had completed her MSc. Studies at the Faculty of Electronic Engineering in Niš, University of Niš, in 2015. She is employed at the Belgrade Metropolitan University since 2015, where she is involved in teaching activities in object-oriented programming, objects and data abstraction, intro to software engineering and software architecture design.

Miroslav Trajanovic
../../Resources/ieie/IEIESPC.2021.10.3.183/au4.png

Miroslav Trajanovic, Professor at Mechanical Engineering Faculty, University of Nis, Nis, Serbia, has had got 30 years of experience in application of IT in mechanical engineering and education. Those experiences include the usage of the most popular CAE programs, writing programs for solving different engineering problems and educating students in IT. He is expert for computer programming, CAD, finite element method and expert systems. He is the author of more than 140 scientific and professional papers (published till 2010). He has also taken part in 15 scientific and professional team projects supported by the Serbian government and industry. He was also project leader for 10 projects mainly in IT and Mechanical Engineering, as well as two European FP6 and six FP7 projects.