Frequency Band and PCA Feature Comparison for EEG Signal Classification

The frequency band method is popular in signal processing; this method separates EEG signals into five bands of frequency. Besides the frequency band, the recent research show PCA method gives a good result to classify digits number from EEG signal. Even PCA give a good accuracy to classify digit number from EEG signal, but there are no research shows which one yielded better accuracy between PCA and frequency band to classify digit number from EEG signals. This paper presents the comparison between those methods using secondary data from MindBigData (MDB). The result shows that the frequency band and PCA achieve 9% and 12,5% on average accuracy with the EPOC dataset. The paired Wilcoxon test produces a significant difference in accuracy between methods in the digit classification problem. Experiment with Muse dataset provides 31% accuracy with frequency band method and 24,8% with PCA method. The result is competitive compared to other experiments to classify digit numbers from EEG signals. In conclusion, there is no winner between the two methods since no method fits both datasets used in this research.


Introduction
Digital signal processing (DSP) is a complex task yet a very hot topic for the researcher. One of the most popular topics in DSP is how to classify signals to be a piece of meaningful information. Voice recognition is one example of how DSP could lead this world to a phase that never happens before. Someone with their phone can give a command to send a message just by a voice, or someone could just turn on and off their car just by a hand clap. Something that feels impossible in the past now becomes a reality. Something even more surprising is brainwaves. Recently the use of brainwaves is increasingly widespread, ranging from detecting brain disease to moving robot hands. One of the most interesting is the use of brain waves to control computer screens or interfacing them. These waves are formed due to the interaction of the neurons in the brain. This interaction generates electricity and is known as brainwaves [1]. To get this signal researcher needs to use a device called electroencephalography. EEG is defined as a measurement of electrical activity produced by the brain [2]. The concept of interfacing a computer directly to the brain is a relatively new one, but the analysis of brain waves has been reported since 1929 [3]. Nowadays, controlling devices by the mind is a very controversial topic but highly researched. Some devices such as smartphones, laptops, and tablets, and even televisions to be used by people with disabilities, for which these technologies could be the only way of communication with the external environment. A BCI is defined as a device that measures the activity of the brain or central nervous system and converts these signals into artificial output [4]. A wide range of applications can apply knowledge of the EEG signal [5], but BCI is not an easy task. BCI research requires expertise and knowledge in many different fields such as signal processing, computer science, computational neuroscience, and embedded intelligent systems.
processing EEG signals so that they can be used in applications is not an easy thing to do. Apart from technical problems such as effective electrode placement and impedance between scalp, signal processing tasks are also difficult. One of the problems is the feature extraction method. Even a simple classifier, if we feed in high-quality data, can produce a high accuracy system. This reason made feature extraction becomes crucial in any classification problem.
Frequency band and PCA methods are widely used in the case of DSP and EEG signals specifically. The recent works related to EEG signals that using PCA to recognize digit numbers from EEG signals have been done in [6]. The researchers used data from MDB and collected it by a device called Insight with five channels and show that PCA based method yielded good accuracy, around 84%. Another research is using Multilayer Perceptron (MLP) to recognize digit numbers from EEG signals have been done in [2]. The data used in that experiment is from MindBigData (MDB) which is collected by a device called Muse with four channels. The research found the best accuracy is 27% with non boosted MLP. Another research is in ref. [7] which had tried to recognize digits numbers from EEG signals using CNN and yielded an accuracy of around 27-34%. The research also used data from MDB that collected by Muse device. Ref. [8] is another EEG research with power spectral density to detect pleasure and displeasure state with the highest accuracy result is 99,3%. However, there is no direct comparison between frequency band and PCA on an object of the problem with the same data and research environment. For this reason, this study conducted a comparison of both methods in a case to recognize digit numbers from the EEG signal. In the end, this research is expected to be a consideration in selecting the feature extraction method in the EEG signal problems so that it can be used in real applications such as BCI to detect a digit numbers signal.

Research Methods
This section will explain the stages carried out in the research. The general steps for classification research contain four major steps that are data acquisition, preprocessing, feature extraction, and testing. There is something to be noticed in that no specific training stage in this research. The reason behind this is that KNN is considered that called a lazy learner algorithm. The step that becomes the emphasis in this research is feature extraction using frequency band and PCA.

Data Acquisition
The data that was used in this research is an EEG signal labeled with a digit of a number that can be found on the MDB website. There are four different datasets collected by four different devices on that website: Mindwave, EPOC, Insight, Muse. Some paper research such as [2], [6], and [7] had used this dataset for their research. That is a secondary dataset collected by another researcher. This research used the data collected by a device called EPOC as the main experiment and can be downloaded from the MDB website. The website provides data of the EEG signal in CSV format in a .txt file extension. This experiment used data that was measured by EPOC. The dataset contains 910,476 rows of data in total and labeled from -1 to 9. Label -1 stands for the subject with a random thought, and other labels thought of a digit number. The subject for this data collection is one with a healthy brain. EPOC has 14 channels, and each channel produces a CSV of decimal value as a result.

Id
: this is just for reference Event : to differentiate between measurement event Device : character to identify what device to use in the measurement Channel : a string to identify the 10/20 brain location of the signal Code : label that the value can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1.

Size
: the size of the signal recorded Data : amplitude as a result of the measurement Only EPOC data follow the rule of 10/20 international electrode placement that is recommended [9]. One subject was stimulated by a digit of number from 0 to 9 in 2s and recorded by EPOC headset. Figure 3 shows in detail the standard of electrode placement. EPOC with 14 channels is qualified for this standard and be the reason that was used in this research. But at the end, this experiment, to get a fair comparison result with other research papers, also uses the Muse dataset from MBD. The experiment used all the data provided collected by Muse, which is 163932 in total. Both measurements by EPOC and Muse use the same subject and collected by the same researcher, and the only difference is the device and channel. More detail of the data can be found through this website http://www.mindbigdata.com/opendb/.

Sampling and Fixed Length
Considering the size of the data obtained, sampling was employed to make this research faster. For each label, 5600 rows of data were taken and 56,000 in total. In theory, the EPOC sample rate is 128Hz [11]. So, to tackle this problem then the signal was padded with 0 or trimmed to make it had a fixed length of 256 values per 2s.

Flattening and Normalization
Since every 14 lines of data represent a measurement, then the data was flattened. Flattening is a process to convert the data into a 1-dimensional array for inputting it to the next layer [12]. This process would have made the dimension of the data was (400, 3584) after that min-max normalization was applied. Min-Max normalization is a method of normalization with performing linear transformations of the original data, thus resulting in a balance of values comparison between data before and after the process [13]. Equation 1 shows the min-max normalization formula, Error! Reference source not found. is explained in detail step by step that needs to follow in this research. The thing to note is that normalization is carried out on the training data; for testing, data use predictor from training normalization.

Frequency Band
Frequency is one of the most important criteria for assessing abnormalities in clinical EEGs and for understanding functional behaviors in cognitive research. There are five major brain waves distinguished by their different frequency ranges. These frequency bands from low to high frequencies, respectively, are typically categorized in specific bands such as 0.5-4 Hz (delta, ), 4-8 Hz (theta, ), 8-13 Hz (alpha, ), 13-30 Hz (beta, ) and >30 Hz (gamma, ) [14]. i.e., alpha waves often appear in the eyes closed, waking state, and relaxed conditions, beta waves often arises when the person is thinking, theta waves in a range of 4-7 Hz and usually occurs when someone is in a night of light sleep, sleepy or stressed, delta waves in the range of 0.5-3 Hz and often present in the person in a state of deep sleep [15]. FFT was employed to convert time domain signal to frequency. For each band, then power spectral, power ratio, and spectral entropy were calculated [17].

Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a technique to transforms several possibly correlated variables into a smaller number of variables called principal components [18]. PCA technique has many goals, including finding relationships between observations, extracting the most important information from the data, outlier detection and removal, and reducing the dimension of the data by keeping only the important information [19]. First, the covariance matrix of the data matrix (X) is calculated. Second, the eigenvalues and eigenvectors of the covariance matrix are calculated.
In detail, to compute PCA can be seen in [20]. Figure 6 shown how PCA transformed data from a higher dimension to a lower dimension just by one component.

K-Nearest Neighbor (KNN)
The KNN algorithm completes its execution in two steps, first finding the number of nearest neighbors and second classifying the data point into a particular class using the first step. To find the neighbor, it makes use of distance metrics like euclidean distance, as given in equation 5 [22].
It chooses the nearest k samples from the training set, then takes the majority vote of their class where k should be an odd number to avoid ambiguity.

Testing Method
In testing, 10-fold validation was used. K-fold CV is a typical procedure to split the data randomly and evenly into K parts. The training set is built based on the K − 1 part of the dataset. The prediction accuracy of this candidate model is then evaluated on a test set containing the data in the hold-out part [23]. For each fold, accuracy is then calculated using equation 6.
Where the term TP is truly positive, TN is a true negative, FP is false positive, and FN is false negative [24].

Result and Discussion
The experiment of this research reported the feature extraction and evaluation using 10-fold validation and accuracy metric.

Feature extraction using Frequency Band
To extract the frequency band feature, each channel in the data transformed into the frequency domain. FFT is the method that was used in this experiment. FFT produced a huge magnitude on zero frequency, so this was made the loss in detail. To solve this, a DC removal operation was then applied.
At the end of the flattening process, the dimension of the data became (400, 210). The flattening result then normalizes using equation (1) and ready to use in KNN classification.

Feature extraction using PCA
To process the data with PCA, flattened and normalized were used to make each measurement unite and balance in weight. After that then PCA can be applied. PCA transforms original data into principal components. The principal component is the key factor when using PCA as a characteristic of a classification problem. Selecting the optimal principle will improve the chance to give a good experiment result. One of the important things to be considered is the cumulative variance explained. By making cumulative variance explained is as close as the original data will make optimal dimension and also keep the originality of the variance. To achieve this small experiment was conducted, and the result is drawn in Figure 9. The graph explains to us that number component 186 will give 99% of the cumulative variance explained.

Result and Analysis
The first assessment was for the frequency band feature. KNN was employed with 210 features and 400 data in total. 70% of 400 data were used as training and 30% as testing with ten labels. Each label would have the same number of data in both training and testing. The Ten-fold validation method was also implemented to give a stable result, and the selected k was 3.

Table 1. Frequency band and KNN
The second experiment was the PCA. The same portion of data and parameters were used in this experiment. for training and 12,3% for testing. Even clearly seen that average accuracy with PCA is better than frequency band in both training and testing set with a 10-fold validation method, a hypothesis test is still another consideration to believe this result significant based on the classic statistical method. Before the test is started normality of the result is tested using the Shapiro-Wilk test since the sample is less than 50. The result can be seen in Figure 10. Since the sig. (p) ≥0,05, then the result is not normal. This condition didn't allow the use of a parametric statistical method. Wilcoxon test was used since both methods, as well as frequency band and PCA using the same data for training and testing.

Figure 11. Paired Wilcoxon rank result
The result showed that training accuracy using frequency band to PCA yielded three negative results. The decrease that occurs in the average accuracy is 2.17. Positive ranks showed that 5 data gives better train accuracy after using PCA for feature extraction. The increase that occurred in the average was 5.90. In contrast, the two results showed ties. Testing accuracy from frequency band to PCA showed two negative results with an average reduction of 3 basis points. Positive ranks showed 8 data that 8 data gives better test accuracy after using PCA with 6.13 improvement on average.  Figure 12 explained that there is no significant difference between train accuracy of frequency band and PCA by looking at sig. (2-tailed) which is lower than 0.05. Otherwise, testing results showed a significant difference between frequency band and PCA since 0.027 lower than 0.05. The experiment showed that PCA based method gives better accuracy than the frequency band method by comparing it descriptively. Wilcoxon test also informs there is a significant difference in accuracy between those methods with 95% of confidence level. So that can be said, PCA based method is significantly better compared to the frequency-based method. Although the accuracy of both methods is smaller than any other research that exists, the comparison between research leads to bias since other research using a different dataset. For example, research conducted by [2] and [7] used the dataset from MDB but was collected by Muse device. The research can achieve an accuracy of around 27% using the non boosted MLP method. In their research, the use of data with label -1 or random thought, which has a larger number compared to other data with labels 0-9 could lead to bias interpretation since there is an imbalanced data problem, and the used of accuracy could give an inaccurate result [25]. Another problem is data in MDB collected by Muse doesn't follow the rule of 10/20 international electrode placement since the device only has 4 channels. Research conducted by [26] provides proof that 10/20 international electrode placement could give better results in analysis EEG data. Even with that reason, the experiment is still conducted with the Muse dataset so that a comparison can be made between research papers. The experiment is conducted by all Muse dataset like [2] and [7] to get a fair comparison.  Table 3 shows that an average frequency band can achieve an accuracy of 31%, and PCA can achieve 24,8%. This result can be interpreted that the frequency band method is better than the PCA method to classify digit numbers from 0-9 and label -1 for random thought with Muse dataset. The result also produces better accuracy with the frequency band method compare to the result in [2] with the non boosted MLP method and gives a competitive result with the experiment in [7]. But important to note that the experiment with the Muse dataset contains data with label -1 dominate 27% in the overall dataset. That is different from the experiment with the EPOC dataset that only considered data with labels 0-9 and made the data size balance which is 40 data for each label or 400 in total. Hence, the result can not be compared with the Muse dataset. In the experiment with the Muse dataset, label -1, which is a random thought, is left as original or imbalance in size. Other than that, EPOC has 14 channels, and Muse only four channels that make the comparison is not fair. Also, the research found here has lower accuracy with the research report in [6]. This might be happened because of the difference in the data and also the way of testing that is used. But overall from the experiment, PCA based method does not always be better in order to classify digit number from EEG signal like what is reported in [6].

Conclusion
PCA method has a significant difference in accuracy than the frequency band method with EPOC dataset labeled by 0-9. PCA yielded 12,3% accuracy in average and frequency band only 9% accuracy. With a 95% of confidence level, there were significant differences in accuracy between PCA and frequency band methods with the EPOC dataset. On the other hand, testing with Muse dataset with data labeled by numbers 0-9 and -1 for random thought produces an accuracy of 31% on average for the frequency band and 25% for PCA. Compared with the result found in [2] and [7], this experiment with frequency band produces a competitive result. Otherwise, compared to [6] the accuracy in this experiment is lower. This might happen because of the data difference and the technique to do the testing. But overall, focus on both datasets used here can be concluded there is no winner method because each dataset favors a specific method. Even the data is similar to be used in digit number classification, but many factors such as device channel and imbalance size of data can be lead to a different result. In the future, analysis to channel and better treatment on the dataset is needed since both methods showing no positive result in terms of use in an application and the use of different datasets to give better generalization results.