Implementation of Sample Sample Bootstrapping for Resampling Pap Smear Single Cell Dataset

The purpose of this study was to determine how the effect of using Bootstrapping Samples for resampling the Harlev dataset in improving the performance of single-cell pap smear classification by dealing with the data imbalance problem. The Harlev dataset used in this study consists of 917 data with 20 attributes. The number of classes on the label had data imbalance in the dataset that affected single-cell pap smear classification performance. The data imbalance in the classification causes machine learning algorithms to produce poor performance in the minority class because they were overwhelmed by the majority class. To overcome it, The resampling data could be used with Sample Bootstrapping. The results of the Sample Bootstrapping were evaluated using the Artificial Neural Network and K-Nearest Neighbors classification methods. The classification used was seven classes and two classes. The classification results using these two methods showed an increase in accuracy, precision, and recall values. The performance improvement reached 10.82% for the two classes classification and 35% for the seven classes classification. It was concluded that Sample Boostrapping was good and robust in improving the classification method.


Introduction
The imbalance in the classification data causes machine learning algorithms to produce poor performance in the minority class because they are overwhelmed by the majority class [1]. Several studies have addressed data imbalance in several ways. The first is to change the class distribution through various resampling, and the second is to set different priorities by modifying the algorithm structure [2]. The problem of unbalanced data like that often occurs in some machine learning application research [1], [3]- [7]. The problem of data imbalance is related to the accuracy of predictions because predictions are biased towards the majority class, while sometimes, prediction accuracy in the minority class is also required. One solution to overcome the data imbalance is to use the resampling technique [7]. Resampling techniques have become a concern, especially in big data [8]- [11]. There are many ways to increase the accuracy of the minority class based on the resampling technique because resampling can balance the number of minority classes with the majority class [3], [12], [13].
The Harlev dataset contains single-cell pap smear data, which has seven diagnostic classes. However, the dataset has a problem with an unbalanced number of classes, where the majority class is very far from the minority class. The Superficial Epithelial class has 74 data, the Intermediate Epithelial class has 70 data, the Columnar Epithelial class has 98 data, the Mild Light Dysplasia class has 182 data, the Severe Dysplasia class has 146 data, Moderate Dysplasia class has 197 data, and the last Carcinoma In Situ class has 150 data. The seven classes of labels can be grouped into two normal and abnormal groups. The use of 7 classes shows that the majority class is Moderate Dysplasia and the minority class is Epithelial which has a far data range of 127 data. Unbalanced data affects classification performance, accuracy, precision, and recall because it is difficult to find information on minority classes [14]. Meanwhile, the use of 2 classes also has a disproportionate amount of 242 for the normal category and 675 for the abnormal category.
Several studies have shown that the use of resampling techniques can improve classification performance. One of the methods commonly used in resamples technique is Sample Bootstrapping. The Sample Bootstrapping method has several advantages. The method does not require any assumptions about the distribution of the data. It can resample the sample data up to thousands of times even though the number of samples was limited, and the method has simple calculation [15], [16]. Thanathamathee and Lursinap [12] used the Sample Bootstrapping method to resample data for classification on the Monk2 dataset. This study showed that the resample Sample Bootstrapping technique increased the accuracy value from 82.13% to 85.96%. Research from Al-Luhaybi et al. [17] also used resampling with the Sample Bootstrapping method to classify student datasets at Brunel University. The accuracy increased after the resampling technique was carried out from 75.59% to 93.1%. Several other studies also used sample bootstrapping as a resampling method for classification [18]- [21].
Research on the Herlev dataset was conducted by Kurniawati [22] that research applied SVM to cervical cancer classification with seven classes without using the resampling technique. This study resulted in a low accuracy value of 78.67%. The research from Kusy et al. [23] also classified cervical cancer data using an artificial neural network without resampling and resulted in an accuracy value that was still not good at 71.87%. Several studies on pap smears for detecting cervical cancer disorders used two classes, including Bora et al., study [24], which applied the KNN method. The results indicated that the accuracy, precision, and recall values were excellent above 80%. Likewise, research from Oka et al., [25] uses two classes with the Artificial Neural Network method gave excellent results of 88.8%. From this research, it could be seen that the performance of the classification method was influenced by an imbalance data problem.
The Herlev data set had an unbalanced number of classes. This study focused on applying the resample technique using the Sample Bootstrapping method to classify cervical cancer. The Sample Bootstrapping method was applied to the Herlev pap smear single-cell data to classify the types of cervical cancer disorders. The results of the Sample Bootstrapping application were evaluated by the Artificial Neural Network and K-Nearest Neighbors classification methods to determine the extent to which Sample Bootstrapping was able to improve the performance of the classification method.

Research Methods
For the training and testing process, the data was split by 10-fold cross-validation. Algorithm performance was measured based on the accuracy, precision, and recall of ANN and KNN methods.

Dataset
The dataset used was the Harlev dataset developed by the pathology department of Harlev University Hospital with the Danish Technical University Automation department. This dataset consisted of 917 single cell images, which have been classified into seven classes by cyto-technicians and specialists [26]. The dataset consists of 20 attributes which were described in Table 1.

Implementation of Sample Sample Bootstrapping
The implementation of Sample Sample Boostrapping (SB) was in the pre-processing stage, which occurred before the data entered the classification process. Sample Bootstrapping was a method used to estimate the deviation from the standard error [28]. Sample Boostrapping used statistical procedures by changing the data from the existing sample and replicating the sample data (resampling) randomly to get new simulation data. The Sample Boostrapping took samples with the replacement method, which replaced the original data randomly with a specific label. The data in the process had an equal chance of being selected. The data could be re-selected in the following process [29]. Based on several studies, the advantages of Sample Boostrapping were the ability to study any statistic of interest and handle sampling error by creating a specific model [30]. The working steps in Sample Boostrapping could not reduce data errors but only estimated standard errors in the data [20]. The steps on the Sample Boostrapping method were [31]: a. Construct a distribution of n Sample Bootstrapping sample (̂) by assigning a probability of 1/n to each data (Xi ) for i=1,2,3,…,n. b. Take a Sample Bootstrapping sample of size n at random with the return of the distribution of stage 1.
c. Choose the replication of each sample ( ̂) statistic from the Sample Bootstrapping sample was referred to as ̂1 * .
d. Repeat steps 2 and 3 until B times, so you got ̂1 * , ̂2 * , …, ̂ * . e. Estimate the standard error (seB) using the standard deviation B times with Equation 1.

Evaluation of Sample Bootstrapping Using Classification Methods
To evaluate the performance of Sample Boostrapping samples, it would be implemented to classification methods and analyzed the performance result of the classification. Classification is the process of forming a model to predict an unknown class pattern [32]. In this study, the methods used in the classification for evaluation were the ANN and KNN method.

Artificial Neural Networks
Artificial Neural Networks (ANN) was a method that could process large data [33]. The ANN consisted of several layers, namely the input layer, hidden layer, and output layer. Hidden layers used were one hidden layer with learning rates of 0.01 and training cycles of 200. The form of the ANN architecture used could be seen in Figure 3. This study used ANN Backpropagation. This algorithm was one of the most frequently used algorithms [34]. .
was the j-th data input (j=1,2,3,…,n) in the hidden layer, 0 was the weight value for bias for unit z j and is the weight value for unit x i . Then the data that came out of the hidden layer to the output layer was calculated by Equation 4.
= ( _ ) (4) where s was the activation function used in the hidden layer. After all the data was calculated, then proceed to the next layer. e. Calculate the forwarded data on the output layer with Equation 5.
Was the k-th data input at the output layer d, and w0k was the weight for bias to the output unit. f. Calculate the data that comes out as output with Equation 6. = ( _ ) (6) g. Prepare for the backpropagation stage. h. Calculate all output data ( , k=1,2,3,…,m) for all target patterns. Calculate the factor error ( ) with Equation 6. = ( − ) ′( _ ) (7) Where was the error used when the layer weight changes, was the output target. Next, update the weight value of by calculating the change in weight using acceleration using Equation 8. ∆ = (8) Update the value of the bias b by calculating the value of the change in bias using Equation 9. ∆ 0 = (9) Then the calculated value was sent to the previous layer. i. Calculate each input data from the output layer with Equation (10).
The input data that has been obtained would be multiplied by the inverse function of the activation function using equation 11.

K-Nearest Neighbors
The second method used for classification was K-Nearest Neighbors (KNN). KNN was an algorithm that worked on the shortest distance from the query instance to the training sample. The goal was to classify an object based on attributes and training samples. For this research, the value of k used was k=5. The following are the steps in the KNN algorithm [36]: a. Determine the parameter k to be used. b. Calculate the distance between the new data and all training data using Euclidean Distance with Equation 16 [32].

=1
(16) Where di was defined as the distance between 1 and 2 , 1 was the sample data and 2 was the test data, i was the data variable.
c. Sort the distance calculation results from the smallest to the largest and determine the nearest neighbor based on the kth minimum distance. d. Claim the class wher it was taken based on the highest number of class.

Algorithm Performance Assessment
Algorithm performance assessment was based on the confusion matrix that appeared after the training and testing process. There were two classes in the confusion matrix, namely positive class and negative class. The true positive (TP) was a positive class that was guessed correctly. The False positive (FP) was a positive class that was guessed wrong. As well as true negative (TN) was a negative class that was guessed right, and The False Negative (FN)was a negative class that was guessed wrong. If the case had more than two classes, one class became a positive class, and the rest became a negative one.
The confusion matrix calculated the accuracy value used to measure the accuracy of the classification results. In addition, from the confusion matrix, the precision value used to calculate the accuracy of the prediction results against the requested information could also be calculated and the recall value used to calculate the ratio of the selected relevant items to the actual value [37].

Result and Discussion
In this study, sample Sample Bootstrapping was used for resampling the single-cell pap smear dataset before classification using ANN and KNN. The steps taken were the research method described above, preprocessing using the SB method for sampling. Then, the classification method used was ANN and KNN. in the sampling process, the parameters used include relative or ration (0-1). Furthermore, the resampled data was validated using n-fold cross-validation. The value of n used was 10-fold. The data set was divided into ten partitions, nine partitions as training data, and one partition for test data. This process was repeated ten times for each section so that every part of the ten sections had become testing data. These stages could be seen in the form of a flowchart which could be seen in Figure 4. This test was carried out on a dataset group with two classes and a group of 7 classes. In the classification of 7 classes, the data were grouped into seven classes based on all types of pap smear cell images in table 2. in the classification of 2 classes, it grouped from 7 classes into two categories normal and abnormal in table 2.

Classification without Sample Sample Bootstrapping
The classification method was used to analyze the result of sample Sample Bootstrapping(SB) was implemented. To be fair, all parameters in both methods used the same parameters when using sample Sample Boostrapping and without it. The number of data before using SB could be seen in Figure 5. The number of data by categories on pap smear data set on the normal class had 242 data, and the abnormal class had 675 data. By 10-fold cross-validation, the classification results using the original dataset or without using the SB could be seen in Table 4.  Table 5, it could be seen that the 7 class classifications had smaller accuracy, precision, and recall values. It could have happened because the data spread in 7 classes was smaller than two classes.

Classification with Sample Bootstrapping
The resampled dataset had the same size as the original dataset. In this study, the number of samples was determined, namely relative to the ratio 1. Resampling with SB was done five times. The number of each class after passing the SB process could be seen in Figure 6. The results of the Sample Boostrapping sample performance on the ANN and KNN methods with two types of classification could be seen in Table 5. Based on Figure 6, it could be seen that the minority class experienced an increase in the amount of data so that the number between classes was not too far apart even though the majority class still had a wide range of values. The number of data by categories became 269 for normal class and 648 for abnormal class. Based on Table 5, classification in 7 classes still produced lower performance scores than two classes, but the results were much better than before. A comparison of results on the use of SB was discussed further in the next section.

Comparison of Results
To see more clearly, the comparison on the use of SB was be divided into 2 Tables. There was a  Table for comparison of 2 classes and the other for a comparison of 7 classes. For comparison in the two classes could be seen in Table 6 below. The SB method of classification could increase the accuracy value, especially the recall value, because the higher the recall value, the better the machine in finding information about a class. The recall value increased due to the machine recognizing the minority class, which was previously biased towards the majority class. Based on the classification method used, the SB method worked well on KNN compared to ANN. It could be seen from the value of the increase that occurred in KNN reaching more than 5% in all performance values. In Table 6, it could be seen in the classification of 2 classes. Furthermore, to see the comparison of the seven classes could be seen in Table 7. In Table 7, the SB method was much better in the 7 class classification because it could be seen from the difference, which was quite far up to 35%. Significant accuracy, precision, and recall values indicate that resampling using SB greatly improved classification results on unbalanced data. Although the amount of data generated ( Figure 6) was still not very balanced, it has made an excellent classification. Based on the method used, KNN also had a better performance value than ANN. It could be seen that the highest difference was in the KNN method.
Although the performance in 7 classes has increased, the numbers produced were not as good as the 2 class classification. However, the SB method worked very well on the KNN method on both classifications because it has increased the accuracy value considerably. This showed that SB was very good at improving the performance of the KNN method for seven classes. To analyze the results of this study further, a comparison of the results with previous studies was carried out. The Comparison of research results for single-cell pap smear classification could be seen in Table  8. In Table 8, it could be seen that several studies used resampling techniques for unbalanced data.
In other studies, it only showed differences in accuracy values as it was known that the accuracy value was not enough to determine an algorithm works well. If there is so much the number of majority classes, the machine could only predict the majority class. In contrast, a good accuracy value could occur because the number of minorities that could not be predicted is few. From Table  8, it could also be seen that only the proposed method could display other performance values and was the advantage of this research. In addition, this study had the highest improvement value compared to the others. Although the classification of 2 classes using the ANN method was still lower than Arifin and Rachman's [39] research, this study had an increase in other performance values not shown in that study. From this comparison, it could be concluded that SB was very good and robust in improving the classification method.

Conclusion
The sample Sample Bootstrapping method was very good and robust for resampling on an imbalanced data problem. That is indicated by the improved classification performance of the study. The highest increase occurred using the KNN method in the classification of 2 classes and seven classes. The highest difference value is the KNN method on seven classes classification, with an increasing value is 35.9% for accuracy, 33.87% for precision, and 34.67% for recall. With a significant increase, it can be concluded that sample Sample Bootstrapping can improve the classification of labels that have many classes.