BaekJaewoo
BaekSuwan
YuHyunSu
LeeJungHwan
ParkCheolsoo*
-
(Department of Computer Engineering, Kwangwoon University / Seoul, Korea
jw03070@naver.com, zhsjzhsj@gmail.com, byeng3@kw.ac.kr, hjn040281@gmail.com,
parkcheolsoo@kw.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Sleep stages, Automatic classification, EEG, 1D-CNN, bi-LSTM
1. Introduction
Although sleep is a very important factor in our lives [1], many people suffer from sleep disorders, such as insomnia, narcolepsy, and sleep
apnea [2]. Therefore, they visit hospitals or sleep centers to test and evaluate their sleep
quality to overcome sleep disorders. In order to diagnose sleep disorders, classification
and analysis of sleep stages should be conducted to estimate their sleep qualities
[3].
According to the American Academy of Sleep Medicine (AASM) standard [4], there are five stages: the wakefulness stage (W), REM stage (REM), and non-REM sleep
stage (N1-N3). These sleep stages are determined using measured polysomnography (PSG)
signals from electroencephalography (EEG), electrocardiogram (ECG), electrooculogram
(EOG), and electromyography (EMG). PSG signals are divided into 30-second segments
called epochs to determine the sleep stage.
So far, sleep experts have manually classified the five sleep stages using PSG
signals based on AASM standards. The manual approach takes a long time and is labor-intensive.
In addition, it produces unstable and inaccurate results because it is a subjective
decision by a sleep expert [5]. In order to solve these problems, automatic sleep staging models using machine learning
and deep learning approaches have been continuously proposed in several studies [6-10]. In this study, we propose a state-of-the-art model with higher performance than
that of previous studies. We employed an end-to-end model using deep neural networks
such as one-dimensional convolutional neural networks (1D-CNN) with an inception time
network [11] module and bidirectional long short-term memory (bi-LSTM) [12].
2. The Proposed Method
In this study, we propose a model without preprocessing to reduce learning time
and generalize the model. Fig. 1 illustrates the proposed model used in this study. An
EEG epoch’s features are extracted by using inception module layers inspired by
InceptionTime [11] and the ensemble method. Ensemble learning produces high performance for classification
and decreases overfitting [13]. Additionally, we used bidirectional-LSTM to teach the model stage transition rules
that sleep experts rely on for their manual sleep scoring.
We performed a two-step training process with different learning epochs inspired
by DeepSleepNet [8]. Since our CNN layers for feature extraction have higher complexity than bidirectional-LSTM
layers, CNN layers do not need many learning epochs. Therefore, the two-step training
process prevents overfitting of our CNN parts.
Fig. 1. Model architecture consisting of two-step training process.
2.1 Epoch Feature Extraction
In this study, we used a 1D-CNN-based InceptionTime module for extracting an
epoch’s features. InceptionTime is a state-of-the-art model for time series classification
proposed by Hassan et al. [10], which is based on an Inception Network [14] designed for image classification tasks. It adopts the receptive field concept for
time series data for tuning the filter size of the CNN [15]. A large receptive field detects large patterns better, and a small receptive field
detects smaller patterns better. Therefore, instead of using the same filter size
for CNN layers, various filter sizes were used.
The number of filters of the convolutional layer in the Inception module is 32,
and the stride is 1. In addition, the features passed through the bottleneck layer
are entered into convolutional neural networks with filter sizes of 10, 20, and 40.
In addition, a feature that has passed through a max pooling layer is entered into
a convolutional layer with a filter size of 1, and a total of four features are concatenated.
2.2 Epoch Sequence Learning
We used three bidirectional-LSTM layers to learn the correlation between features
of epochs. According to the AASM standard [3], if sleep spindles or a K-complex intervenes in a record that meets the requirements
of the N1 stage (low amplitude and mixed frequency), both the epochs before and after
are scored as the N2 stage. To learn this rule, the proposed model considers 10 epochs
as one sentence and classifies N1 stages and N2 stages. Bidirectional-LSTMs manage
forward and backward sequences by merging two LSTMs. Thus, past and future information
can be learned.
2.3 Model Hyperparameters
There are several hyperparameters in this study, as shown in Table 1. We manually selected hyperparameters by trial and error. The first learning epoch
is smaller than the second learning epoch because the complexity of our feature extraction
part is higher than the epoch sequence learning part, and the batch size is 80. However,
in order to connect an epoch feature extraction and epoch sequence learning, the batch
shape was changed to (8,10) tuples instead of 80 scalar values. Thus, the first and
second axes of the input shape are batches, the third axis is the data length, and
the fourth axis is the channel. There are four random seeds due to the ensemble learning.
Table 1. Model hyperparameters.
Hyperparameter
|
Value
|
Optimizer
|
Adam
|
Learning rate
|
0.0001
|
Batch size
|
80
|
First learning epoch
|
10
|
Second learning epoch
|
50
|
Weight initializer
|
GlorotUniform
|
Random seeds
|
0, 777, 1234, 1479
|
Input shape
|
(8,10,3000,1)
|
3. Performance Evaluation
3.1 Dataset
We employed the Sleep-EDF dataset [16], which has two PSG records per participant and is divided into two participant groups.
The SC group consists of participants who did not take sleep-related drugs, and the
ST group consists of participants who took temazepam to study its effects as a drug
for treating insomnia. Each PSG record contains EEG, EOG, and chin EMG signals, among
which EEG signals have Fpz-Cz and Pz-Cz channels.
The EOG and EEG signals in PSG are each sampled at 100 Hz, and each epoch (30sec)
has a sleep stage label. These records were classified and labeled in six stages by
sleep experts based on the R&K standard [17]. We integrated the N3 and N4 stages into the N3 stage to meet the AASM standard,
making it five stages. We used only PSG records of the SC group subjects to use signals
from healthy participants and used the Fpz-Cz channel of EEG signals. In the Sleep-EDF
database, we used 20 participants from the SC group. Since participant 13 only has
one record due to disk loss, we used a total of 39 PSG records.
3.2 Performance Evaluation
To evaluate the generalization performance, we used k-fold cross validation [18]. We performed leave-one-patient-out cross validation so that we do not mix the PSG
records of the same subject in the training dataset and the test dataset. This method
evaluates whether the model is practical in a real-world application.
We evaluated our model performance using four metrics: overall accuracy (ACC),
per-class recall (RE), per-class precision (PRE), per-class F1 score (F1), and macro-averaged
F1 score (MF1). Eqs. (1)-(5) show the calculations of each metric:
where C is the number of classes, and TP, TN, FN, and FP are true positive, true
negative, false positive, and false positive, respectively. Table 2 shows a confusion matrix for each sleep stage, and Table 3 shows the performance metrics of each sleep stage through 20-fold cross validation.
We averaged the F1 score for each sleep stage and calculated the overall accuracy.
The macro-averaged F1 score was 79.05%, and the accuracy was 85.05%.
Table 2. Confusion matrix for the five sleep stages.
|
Prediction
|
W
|
N1
|
N2
|
N3
|
REM
|
True
|
W
|
7346
|
507
|
123
|
25
|
156
|
N1
|
524
|
1158
|
641
|
12
|
469
|
N2
|
437
|
384
|
15686
|
690
|
602
|
N3
|
36
|
3
|
556
|
5108
|
0
|
REM
|
215
|
201
|
721
|
6
|
6574
|
Table 3. Per-class performance: recall, precision, and F1 score.
|
Wake
|
N1
|
N2
|
N3
|
REM
|
RE (%)
|
90.06
|
41.30
|
88.13
|
89.87
|
85.19
|
PRE (%)
|
85.83
|
51.40
|
88.49
|
87.45
|
84.27
|
F1 (%)
|
87.90
|
45.80
|
88.85
|
88.50
|
84.73
|
3.3 Benchmark Test
Table 4 shows the performance metrics from other studies using the same dataset and a single
EEG channel. Compared to the others, our proposed model has the highest performance
metrics, accuracy, and macro-F1 average. Therefore, our model has better generalization
performance than others.
Table 4. Comparison of proposed model with other approaches.
Study
|
Overall metrics
|
Per-class F1 score (%)
|
ACC
(%)
|
MF1
(%)
|
W
|
N1
|
N2
|
N3
|
REM
|
Tsinalis et al. [19]
|
78.9
|
73.7
|
71.6
|
47.0
|
84.6
|
84.0
|
81.4
|
IITNET [20]
|
84.0
|
77.7
|
87.9
|
44.7
|
88.0
|
85.7
|
82.1
|
DeepSleepNet [8]
|
82.0
|
76.9
|
84.7
|
46.6
|
85.9
|
84.8
|
82.4
|
Zhu et al. [10]
|
82.8
|
77.8
|
90.3
|
47.1
|
86.0
|
82.1
|
83.2
|
Eldele et al. [21]
|
84.4
|
78.1
|
89.7
|
42.6
|
88.8
|
90.2
|
79.0
|
Proposed
|
85.1
|
79.1
|
87.9
|
45.8
|
88.9
|
88.5
|
84.7
|
5. Conclusion
Our model for automatic staging was designed using an ensemble method and two-step
learning based on a 1D-CNN and bidirectional LSTM. These techniques had the highest
performance for classification. Since our model has no preprocessing, it is lighter
and more generalizable than other deep learning sleep stage classification models.
However, due to the lack of the number of N1 stages, the N1 stage classification performance
is still weak. Thus, we will improve our algorithm to solve the sleep stage imbalance
problem in the future.
ACKNOWLEDGMENTS
This research was supported by the MIST (Ministry of Science and ICT) under the
National Program for Excellence in SW (2017-0-00096), which is supervised by the IITP
(Institute for Information & communications Technology Promotion).
REFERENCES
Mukherjee S., Patel S. R., Kales S. N., Ayas N. T., Strohl K. P., Gozal D., Malhotra
A., 2015, An official American Thoracic Society statement: the importance of healthy
sleep. Recommendations and future priorities., American journal of respiratory and
critical care medicine, Vol. 191, No. 12
Chokroverty S., 2010, Overview of sleep & sleep disorders, Indian J. Med. Res., Vol.
131, No. 2, pp. 126-140
Krystal A. D., Edinger J. D., 2008, Measuring sleep quality, Sleep Med., Vol. 9, No.
suppl. 1, pp. 10-17
Berry R. B., et al. , 2017, AASM scoring manual updates for 2017 (version 2.4), J.
Clin. Sleep Med., Vol. 13, No. 5, pp. 665-666
Whitney C. W., et al. , 1998, Reliability of scoring respiratory disturbance indices
and sleep staging, Sleep, Vol. 21, No. 7, pp. 749-757
Fraiwan L., Lweesy K., Khasawneh N., Wenz H., Dickhaus H., 2012, Automated sleep stage
identification system based on time-frequency analysis of a single EEG channel and
random forest classifier, Comput. Methods Programs Biomed., Vol. 108, No. 1, pp. 10-19
Shen X., Fan Y., 2012, Sleep stage classification based on eeg signals by using improved
hilbert-huang transform, Appl. Mech. Mater., Vol. 138-139, pp. 1096-1101
Supratak A., Dong H., Wu C., Guo Y., 2017, DeepSleepNet: A model for automatic sleep
stage scoring based on raw single-channel EEG, IEEE Trans. Neural Syst. Rehabil. Eng.,
Vol. 25, No. 11, pp. 1998-2008
Phan H., Andreotti F., Cooray N., Oliver Chen Y., De Vos M., 2018, DNN Filter Bank
Improves 1-Max Pooling CNN for Single-Channel EEG Automatic Sleep Stage Classification,
Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBS, Vol. 2018-july, pp. 453-456
Zhu T., Luo W., Yu F., 2020, Convolution-and attention-based neural network for automated
sleep stage classification, Int. J. Environ. Res. Public Health, Vol. 17, No. 11,
pp. 1-13
Ismail Fawaz H., et al. , 2020, InceptionTime: Finding AlexNet for time series classification,
Data Min. Knowl. Discov., Vol. 34, No. 6, pp. 1936-1962
M. A., Alex Graves G. H., 2013, Speech Recognition with Deep Recurrent Neural Networks,
Department of Computer Science, University of Toronto, Dep. Comput. Sci. Univ. Toronto,
Vol. 3, No. 3, pp. 45-49
Dietterich, Thomas G. , 2002, Ensemble learning The handbook of brain theory and neural
networks 2.1, pp. 110-125
Luo W., Li Y., Urtasun R., Zemel R., 2016, Understanding the effective receptive field
in deep convolutional neural networks, Adv. Neural Inf. Process. Syst., no. Nips,
pp. 4905-4913
Goldberger A. L., Amaral L. A., Glass L., Hausdorff J. M., Ivanov P. C., Mark R. G.,
Stanley H. E., 2000, PhysioBank, PhysioToolkit, and PhysioNet: components of a new
research resource for complex physiologic signals., circulation 101.23.
T. Hori, et al. , 2001, Proposed supplements and amendments to ‘A Manual of Standardized
Terminology, Techniques and Scoring System for Sleep Stages of Human Subjects’, the
Rechtschaffen & Kales (1968) standard, Psychiatry Clin. Neurosci., Vol. 55, No. 3,
pp. 305-310
Stone M., 1974, Cross-Validatory Choice and Assessment of Statistical Predictions,
J. R. Stat. Soc. Ser. B, Vol. 36, No. 2, pp. 111-133
Tsinalis O., Matthews P. M., Guo Y., 2016, Automatic Sleep Stage Scoring Using Time-Frequency
Analysis and Stacked Sparse Autoencoders, Ann. Biomed. Eng., Vol. 44, No. 5, pp. 1587-1597
Seo H., Back S., Lee S., Park D., Kim T., Lee K., 2020, Intra- and inter-epoch temporal
context network (IITNet) using sub-epoch features for automatic sleep scoring on raw
single-channel EEG, Biomed. Signal Process. Control, Vol. 61
Eldele E., et al. , 2021, An Attention-Based Deep Learning Approach for Sleep Stage
Classification With Single-Channel EEG, in IEEE Transactions on Neural Systems and
Rehabilitation Engineering, Vol. 29, pp. 809-818
Author
JaeWoo Baek received his B.S. degree in computer engineering from Kwangwoon University
in Seoul, South Korea. His research interests include biological signal processing,
machine learning, deep learning, and reinforcement learning.
SuHwan Baek received his B.S. degree in computer engineering from Kwangwoon University
in Seoul, South Korea. His research interests include overall Medical AI and Auto
ML (ENAS). He is also attracted to reinforcement learning and generative models.
Hyunsu Yu received his BS degree in robotics engineering from Kwangwoon University
in Seoul, South Korea. His research interests include experimental setting, signal
processing, machine learning, and artificial intelligence.
Junghwan Lee is in the MSc Program at the Bio Computing & Machine Learning Laboratory
(BCML) in the Department of Computer Engineering at Kwangwoon University, Seoul, Republic
of Korea. His research interests include machine learning and deep learning algorithms.
Cheolsoo Park is an associate professor in the Computer Engineering Department
at Kwangoon University, Seoul, South Korea. He received a B.Eng. in Electrical Engineering
from Sogang University, Seoul, and an MSc from the Biomedical Engineering Department
at Seoul National University, South Korea. In 2012, he received his PhD in Adaptive
Nonlinear Signal Processing from Imperial College London, U.K., and worked as a postdoctoral
researcher in the Bioengineering Department at the University of California, San Diego,
U.S.A. His research interests are mainly in the areas of machine learning and adaptive
and statical signal processing, with applications in brain computer interfaces, computational
neuroscience, and wearable technology.