Detecting Excessive Daytime Sleepiness with CNN and Commercial Grade EEG

Excessive Daytime sleepiness is a common symptom that has proved to be a good predictor of obstructive sleep apnea. This symptom became a focus on various studies or a computer-aided diagnostic tool in the sleep medicine world. However, the current implementation of excessive daytime sleepiness mainly relied on subjective features and did not overly emphasize common objective features, such as brainwaves. Even though few studies show that the Epworth sleepiness scale test results correlated with the brainwave signal, even commercial-grade EEG can capture. This research compared the three CNN architecture performances to overcome these problems, namely the classic AlexNet architecture and two custom CNN architectures. The study tested on 20 university students taking the Epworth sleepiness test beforehand. Then, we put the participant in 10 minutes EEG session, downsampling the data for normalization purposes and trying to predict the outcome of the EDS in respect of their brainwave state. The AI accuracy reaches 65% and 81% of sensitivity with just under five minutes of excellent initial training, considering the small dataset.


Introduction
Obstructive sleep apnea is a severe sleep disorder where the patient breath repeatedly restarts during nighttime sleep. It is approximately affecting on average 6% of any country's population. Obstructive sleep apnea is commonly associated with excessive daytime sleepiness or hypersomnia. It is another sleeping disorder in which the patient falls asleep repeatedly during the day [1], [2]. In today's world, there are reports of an increase in the number of hypersomnia across the globe as the world adapts to coronavirus. Many people also found themselves changing their biological clock and becoming overly dependent on the digital screen [3]. Fortunately, there is an old age method for easy detection of excessive daytime sleepiness, invented by Dr. Murray Jhons when he worked in Epworth sleeping center. This test is named accordingly and known as the Epworth sleepiness scale [4]. This test is a self-assessment questionnaire that has a set of questions. It will relate to the most common symptoms of EDS. This simple test has proved to be an excellent clinical instrument to detect hypersomnia [5]. The only drawback is the method relies too heavily on subjective assessment of the test results. The need for trained professionals is only to examine the self-assessment test. It causes the Epworth sleepiness case not a very scalable option in the post-pandemic world these days, especially in Indonesia where the social restriction runs [6]. In recent years, advancements in computer-aided clinical diagnosis also have momentum with the intrusive entry of AI into the public health world. There are various attempts to detect sleep disorders in the sleep medicine world, for example, this study [7], [8], and [9]. For excessive daytime disorder, various studies have tried to solve the problem. Research by the Iranian University detects obstructive sleeping apnea using the EDS as a benchmark and Decision Tree as a classifier [10]. Instead of using biological markers, the research still using self-assessment. Instead of using prediction based on subjective The experiment started with screening a random subject of university students for symptoms of excessive daytime sleepiness using the standard Epworth sleepiness scale test. Then, we divide the population into two classes: one with positive excessive daytime sleepiness symptoms and the other with negative class. Then we record the data from each participant and apply the minmax normalization method to normalize the data for normalization. Then we do a signal decomposition and labeling to extract the individual signal (namely alpha, beta, and gamma signal to the system). For the training data itself, we use the same normalization and frequency decomposition technique. Then we train the dataset to the classifier and save the data to the pretrained model file to later be used in the testing phase.
Data used in this research was a dataset that we used as a training set belonging to Carnegie Mellon University Language technology institute [13]. The recorded data used a single band EEG handheld device capable of recording brain wave data from the participant. The wave consists of alpha-beta and theta wave, and it also translated into the pre-recognition of participant state of mind powered by Neurosky proprietary algorithm. We used it as a baseline of point of view at the end of the research. The baseline used in the study was a sampling rate of 512 Hz. The training set consists of 14200 sessions taken from 25 participants, of which half of the sessions consider insufficient attention and 7000 which participants viewed as a good attention model. While the testing set consists of 2.000 sessions taken from 20 participants, each session consists of two seconds of brainwave recording both the training and testing set used in this research. It was only limited to sessions that captured their attention level and were associated with the Epworth sleepiness scale.

Data Acquisition
The data acquired in this research consists of training and testing data. The training data is public research data consisting of EEG data from the Carnegie-Mellon public research data archive found on Kaggle. This information is usually used to classify confused students (thus interpreted as a low attention state). Some paper research such as Confused or not Confused Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks, Multi-Task Learning for Commercial Brain-Computer Interfaces, and Electroencephalography (EEG) Technology Applications and Available Devices had used this dataset too [14], [15], [16]. We also added a little twist to the experiment by adding the self-assessment of Epworth sleepiness scale for every participant for this data since it was founded the strong correlation from EEG signal data with a score of self-reported sleepiness scale [17]. Data we used for training provided in CSV format that had collected with Neurosky Mindwave single band headset that had accredited using the standardized 10-20 international electrode placement standard [18]. Datasets contained the 12.812 rows that measured the raw signal data, alpha-beta, and gamma signal with their selfreported sleepiness scale. The participated subject in this research was considered a subject with healthy mental health. Neurosky headset had a single Channel that had proved to be having enough capabilities to gain big data. The data itself would be in the form of a float number with the decimal value representing the brainwave and a number in integer value representing the mental state and sleepiness level. The data explanation shows in Figure 2. 189 The data and experiment conducted here rely heavily on the EEG device. Figure 3 shows the illustration of the standard electrode placement known as 10-20 standard.

Data Sampling
Since the data obtained in this research had a massive size compared to usual tabular-matrix size data, it conducted the sampling process to simplify analysis for each class. The 5000 data was taken and labeled accordingly as low and average attention levels, respectively. In theory, the sampling rate of the Neurosky headset is 512Hz, so the data is also trimmed and normalized according to the usual specification, and the data considered noise omits in preprocessing phase.

Data Normalization
Since every eight rows of data represent one aspect of the classifier, thus the data is a subject of the flattening process. Flattening is a popular term for a statistical equation that transports multidimensional data into a single layer of data. This process made the data more compact to process later in a training phase [19]. We used the minimum-maximum process in this phase. A popular process is named the min-max normalization. This process performed a linear transformation process to the data, hoping to produce a more balanced dataset that will equal a more fair comparison among the dataset. Mathematically, speaking represents with the equation below.

Brainwave Frequency Labelling
EEG signal and frequency are the most common health and clinical research criteria when we speak about EEG. In the study of psychology and the academic consensus, five central frequencies are labeled as a different kind altogether. It usually categorizes from low to high frequencies. Respectively, these are commonly used based on the Greek alphabets, such as 0.5-4 Hz (delta), 4-8 Hz (theta), 8-13 Hz (alpha), 13-30 Hz (beta), and >30 Hz (gamma). Commonly, Alpha states are associated with waking states and relaxed states of mind. Beta waves are associated with full attention in mind. Theta waves are frequently associated with a sleepy individual or a biological marker of a stressed or highly working brain. Last, delta waves reside in the range of 0.5-3 Hz and are often associated with a state of deep sleep [20].

Convolutional Neural Network (CNN)
Convolutional Neural Network or popular with the abbreviation of CNN, is the recent development of the ever-changing artificial intelligence and machine learning field. Its popularity increased in recent years because of the rise of cloud computing and a sheer collaboration movement on open-source machine learning frameworks like Tensorflow and OpenAI initiative makes CNN the favorite approach to tackle machine learning problems. These problems are especially the machine learning problem in the computer vision field. The keen architecture of CNN makes the network functional in tackling multidimensional data. The CNN approach that considers every bit of pixel in the data is independent of each other makes the classifier thrive in the image or spatial based classification popular recently with the development of big data on the internet [21]. CNN as a scientific term is firstly mentioned in the paper by a young Japanese researcher known to the public as Kaneshiro Fukushima, whose lab is Kinuta Setagaya NHK research laboratory invented with Neocognitron. Later, it inspired the Turing Award awardee YanLecunn to develop and implement a fully-fledged CNN classifier with the name of Lenet and the inventor's last name attached within it [22]. A decade later, the same CNN model won a prestigious machine learning contest in 2012 held by Google. Model outperforming a more classical model like SVM and other perceptron-based models. This winning record in the machine learning contest fueled the popularity of the CNN model to the masses. It is one of the reasons CNN is still used today as one of the states of the art of image recognition to date.

The architecture of Convolutional Neural Network
The standard artificial neural network is a bunch of connected artificial neurons stacked into the various layers of the neuron learn itself, which is the fastest way to solve the problem. It is revolutionary compared to the traditional procedural programming paradigm with case by case basis [23]. If we move ahead to the realm of multilayer perceptron, a well-known neural network architecture without the hidden layer part, has explained to have the capabilities to map a linear equation with the various versatile condition and variable sets. We track back with the limitation of perception that is only good for a problem with a small dataset. But even with all the good well of MLP, it has to come down to the advancement of the big data field. The rise of the big data corner MLP to its corner as the limit of MLP. The layer that can support by MLP is limited as many experts prove that MLP will lose its magic with an architecture of more than three layers as more than that, MLP will be prone to the overfitting problem and will reach its point the diminishing return. Then, we have deep learning that can be easy to implement using CNN, which can substitute MLP in its weak point to manage complex and big data as with CNN possible to develop a machine that can transform input data to data. It would be easier to feed to the network that makes deep learning appealing with this so-called machine making hundreds of layers is now possible. It makes deep learning is a newly found swiss-army knife equivalent method in the machine learning world. A typical CNN implementation consists of as follows.

a. Convolution Layer
The convolution layer does a massive convolution operation in the network. It means whatever data comes from the previous will be processed repeatedly in a forced mathematical function. Then be treated as an input of other functions. The convolution operation illustrates as follows.

b. Fully Connection Layer
A fully connected layer is a layer of neural network that mimics the mechanism of a multilayer perceptron. The principal purpose of every fully connected layer is to transform multidimensional data into more simple data in a dimensional form. It includes a scalar form; as per consensus, each cell of the neuron needs to be transformed into one-dimensional data at first before it can combine to form a fully connected layer.

c. Activation Layer
An activation function is a mathematical form used to present functions of classifying our dataset with a division in a hyperspace using whatever criteria we used. If we talk about binary classification, the one that comes to mind is a sigmoid function illustrated in Figure 6 below.

Figure 6. Sigmoid Function
A sigmoid function is best to classify a binary problem because its output tends to be between zero and one.

d. Dropout
Dropout is a term that refers to a machine learning technique for addressing overfitting in the realm of deep learning. Popularized in a paper by the University of Toronto in a team led by Nitish Srivastava Dropout layer offers a simple idea to randomly drop neural network units along with their connections to the main neural network during the training phase. This sort of action prevents the neuron from co-adapting too much during the process. Operations performed by the network are to sample the overall predictions from a few selected thinned networks. Simply, it uses a single unthinned network that has smaller weights to be added as consensus. It has been proven to reduce overfitting and significantly boost deep learning or CNN-powered neural networks [24].

Epworth Sleepiness Scale
Epworth sleepiness scale is a classic method in the sleep medicine field commonly used to evaluate the level of general sleepiness among the participant. It is commonly used as a clinical predictor of hypersomnia or excessive daytime sleepiness. Then, it is a good predictor of obstructive sleep apnea, as time passed become one of the methods used and perfected in the recent decade as the uses of the scaling method have proven to be effective from time to time as cited in this study [25]. This study explains the children's hypersomnia in Indonesia or even in the other hemisphere.

Testing Method
As we tried to emulate the effectiveness of a clinically proven method, named the Epworth sleepiness scale, we focused on the model accuracy. We used the confusion matrix as the threshold for the model performance. But the metric we used did not only be limited to accuracy as we would also evaluate the system using other popular metrics such as recall, precision, and F-score. The model compared with the self-assessment that used the Epworth sleepiness scale, and then the test results were used as a sole indicator of the classifier's success. The classifier tested with the data from 50 healthy university students with various biological clocks and sleeping patterns with an age range from 19-27 in Denpasar city. The experiment results are presented in the next section.

Result and Discussion
The research focuses on an effort to detect excessive daytime sleepiness. The research starts with data collection, which is the data among the volunteer is collected, through then the data is normalized using equation 1. The signal itself is decomposed to various frequency bands like what we illustrated in Figure 4 then we predict the new patient data using the pre-trained model. We list the outcome to evaluate the result. In the experiment, three CNN architectures illustrated in Figure 5 are tested and compared. We choose to compare AlexNet and two custom CNN architectures to find the optimal result. The experiment results are then listed in the table shown below. In this experiment, the researcher compared the effectiveness of our Artificial Intelligence (AI) prediction with the self-assessment model that each participant performed to see how many of the predictions turned out to be aligned with an old manual model of detecting excessive daytime sleepiness. We list the outcome and do a bit of statistical evaluation on the data to get the classifier's metric, namely accuracy, precision, and recall. Based on the test results in Table 1 using the custom CNN and custom CNN + Dropout method, the results show that the CNN + Dropout method outperforms the custom CNN method by 5% and the classic AlexNet method by 13%. Then, the comparison results were used to see the suitability between the results of the Epworth Sleepiness Scale using a manual questionnaire and analyzed using EEG that divides into two conditions, namely Normal DS (Normal DS) and Excessive Daytime Sleepiness (Excessive DS). With the randomized testing that the classifier performed, the classifier somewhat produces a satisfying result. The success rate of predicting excessive daytime sleepiness yields 65% accuracy, which is a likely result of the limited training set. Nevertheless, the complete metrics test which we measure regarding the classifier performance presents below.

Conclusion
The classification performed by the classifier produces good results with an accuracy topped at 65% with the addition of the dropout layer to the classifier. This attempt of excessive sleepiness classifier performs well on sensitivity metrics with a yield of 86% compared to the standard architecture. The addition of the dropout layer slightly increased the performance of the future classifier works are needed to investigate the correlation of data size to the overall classifier performance since compared to other research in the field of EEG dataset that we collected would be considered small. Further studies on the comparative performance of various EEG devices on tackling this problem also have a great potential to be performed. However, it may be costly compared to our low-cost solution. An improvement in preprocessing is also to consider since EEG data on an enormous basis is very prone to noise if it does not handle properly.