Implementation of Sample Sample Bootstrapping for Resampling Pap Smear Single Cell Dataset

Anita Desiani; Azhar Kholiq Affandi; Shania Putri Andhini; Sugandi Yahdin; Yuli Andirani; Muhammad Arhami

doi:10.24843/LKJITI.2022.v13.i02.p01

Anita Desiani Univesitas sriwijaya
Azhar Kholiq Affandi
Shania Putri Andhini Univesitas sriwijaya
Sugandi Yahdin
Yuli Andirani Mathematics Department, Mathematics and Natural Science faculty, Universitas Sriwijaya
Muhammad Arhami

DOI: https://doi.org/10.24843/LKJITI.2022.v13.i02.p01

Abstract

The purpose of this study was to determine how the effect of using Bootstrapping Samples for resampling the Harlev dataset in improving the performance of single-cell pap smear classification by dealing with the data imbalance problem. The Harlev dataset used in this study consists of 917 data with 20 attributes. The number of classes on the label had data imbalance in the dataset that affected single-cell pap smear classification performance. The data imbalance in the classification causes machine learning algorithms to produce poor performance in the minority class because they were overwhelmed by the majority class. To overcome it, The resampling data could be used with Sample Bootstrapping. The results of the Sample Bootstrapping were evaluated using the Artificial Neural Network and K-Nearest Neighbors classification methods. The classification used was seven classes and two classes. The classification results using these two methods showed an increase in accuracy, precision, and recall values. The performance improvement reached 10.82% for the two classes classification and 35% for the seven classes classification. It was concluded that Sample Boostrapping was good and robust in improving the classification method.

Downloads

Download data is not yet available.

References

[1] A. Zughrat, M. Mahfouf, Y. Y. Yang, and S. Thornton, “Support Vector Machines for Class Imbalance Rail Data Classification with Sample Boostrapping- based Over-Sampling and Under-Sampling, ” IFAC Proceedings Volumes, vol. 47, no. 3. IFAC, 2014.
[2] A. Tharwat, T. Gabel, and T. Gabel, “Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm,” Neural Computing and Applications, vol. 0123456789, 2019, doi: 10.1007/s00521-019-04159-z.
[3] R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students ’ Performance Using Machine Learning Techniques,” IEEE Access, vol. 8, pp. 67899–67911, 2020, doi: 10.1109/ACCESS.2020.2986809.
[4] J. . Sanz, D. Bernardo, F. Herrera, H. Bustince, and H. Hagras, “A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data,” IEEE Transactions on Fuzzy Systems, vol. 23, no. 4, pp. 973–990, 2015.
[5] W. Wei, J. Li, L. Cao, Y. Ou, and J. Chen, “Effective detection of sophisticated online banking fraud on extremely imbalanced data,” World Wide Web, vol. 16, no. 4, pp. 449–475, 2013.
[6] H. Yu, J. Ni, and J. Zhao, “ACOSampling: An ant Colony Optimizaiton-Based Undersampling Method for Classifying Imbalanced DNA Microarray Data,” Neurocomputing, vol. 101, pp. 309–318, 2013.
[7] T. Sasada, Z. Liu, T. Baba, K. Hatano, and Y. Kimura, “A resampling method for imbalanced datasets considering noise and overlap,” Procedia Computer Science, vol. 176, pp. 420–429, 2020, doi: 10.1016/j.procs.2020.08.043.
[8] I. Triguero, S. del Rio, V. L´opez, J. Bacardit, J. . Ben´ıtez, and F. Herrera, “ROSEFW-RF: The winner algorithm for the ECBDL14 big data competition. An extremely imbalanced big data bioinformatics problem,” Knowledge-Based Systems, vol. 87, pp. 69–79, 2015.
[9] M. Koziarski and M. Wozniak, “CCR: A combined cleaning and resampling algorithm for imbalanced data classification,” International Journal of Applied Mathematics and Computer Science, vol. 27, no. 4, pp. 727–736, 2017, doi: 10.1515/amcs-2017-0050.
[10] T. . Hoens, R. Polikar, and N. . Chawla, “Learning from streaming data with concept drift and imbalance: An overview,” Progress in Artificial Intelligence, vol. 1, no. 1, pp. 89–101, 2012.
[11] F. Fernandez-Navarro, C. Hervas-Martinez, and P. . Gutierrez, “A dynamic over-sampling procedure based on sensitivity for multi-class problems,” Pattern Recognition, vol. 44, no. 8, pp. 1821–1833, 2011.
[12] P. Thanathamathee and C. Lursinsap, “Handling Imbalanced Data Sets with Synthetic Boundary Data Generation Using Sample Bootstrapping Re-sampling and AdaBoost Techniques,” Pattern Recognition Letters, vol. 34, no. 12, pp. 1339–1347, 2013, doi: 10.1016/j.patrec.2013.04.019.
[13] A. Elhassan, M. Aljourf, F. Al-Mohanna, and M. Shoukri, “Classification of Imbalance Data using Tomek Link ( T-Link ) Combined with Random Under-sampling ( RUS ) as a Data Reduction Method Technology & Optimization,” Global Journal of Technology & Optimization, vol. 1, no. 111, pp. 1–11, 2017, doi: 10.4172/2229-8711.S1.
[14] A. Desiani, S. Yahdin, and A. Kartikasari, “Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment,” IAES International Journal of Artificial Intelligence, vol. 10, no. 2, pp. 346–354, 2021, doi: 10.11591/ijai.v10.i2.pp346-354.
[15] I. Ivanov, “Tenfold Boostrap Procedure for Support Vector Machine,” Computer Science, vol. 21, no. 2, pp. 253–268, 2020.
[16] I. Rodliyah, “Perbandingan Metode Sample Bootstrapping Dan Jackknife ( Comparison of Sample Bootstrapping and Jackknife Methods To,” Jurnal Matematika dan Pendidikan Matematika, vol. 1, no. 1, pp. 76–86, 2016.
[17] M. Al-Luhaybi, L. Yousefi, S. Swift, S. Counsell, and A. Tucker, Predicting academic performance: A Sample Boostrapping approach for learning dynamic bayesian networks, vol. 11625 LNAI. Springer International Publishing, 2019.
[18] T. Agus, S. M. Adib, and A. Karomi, “Penerapan Metode Sample Sample Boostrapping untuk Meningkatkan Performa kNearest Neighbor pada Dataset Berdimensi Tinggi,” IC-Tech, vol. XII, no. 1, pp. 9–14, 2017.
[19] T. A. Setiawan, R. Satria, and A. Syukur, “Integrasi Metode Sample Sample Boostrapping dan Weighted Principal Component Analysis untuk Meningkatkan Performa k Nearest Neighbor pada Dataset Besar,” Journal of Intelligent Systems, vol. 1, no. 2, pp. 76–81, 2015.
[20] E. Siswanto, Suprapedi, and Purwanto, “Metode Sample Boostraping Pada K-Nearest Neighbor Untuk Klasifikasi Status Desa,” Jurnal Teknologi Informasi, vol. 14, no. 1, pp. 13–23, 2018.
[21] E. Jumiati and M. R. Kamal, “Integrasi Sample Sample Boostrapping Pada K-Nearest Neighbor untuk Klasifikasi Herregistrasi Calon Mahasiswa Baru,” IC-Tech, vol. 12, no. 1, pp. 23–32, 2017.
[22] Y. E. Kurniawati, A. E. Permanasari, and S. Fauziati, “Comparative study on data mining classification methods for cervical cancer prediction using pap smear results,” Proceeding 2016 1st International Conference on Biomedical Engineering (IBIOMED) 2016, 2017, doi: 10.1109/IBIOMED.2016.7869827.
[23] M. Kusy, B. Obrzut, and J. Kluska, “Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients,” Medical & Biological Engineering & Computing, vol. 51, no. 12, pp. 1357–1365, 2013, doi: 10.1007/s11517-013-1108-8.
[24] K. Bora, M. Chowdhury, and L. B. Mahanta, “Automated classification of Pap smear images to detect cervical dysplasia,” Comput Methods Programs Biomed, vol. 138, pp. 31–47, 2017, doi: 10.1016/j.cmpb.2016.10.001.
[25] N. P. A. Wiastini Oka, I. K. G. Darma Putra, and K. S. Wibawa, “Klasifikasi Sel Nukleus Pap Smear Menggunakan Metode Backpropagation Neural Network,” Jurnal Ilmiah Merpati, vol. 7, no. 3, pp. 182–192, 2019.
[26] D. Riana, D. H. Widyantoro, T. Latifah, and R. Mengko, “Ekstraksi dan Klasifikasi Tekstur Citra Sel Nukleus Pap Smear,” Jurnal TICOM, vol. 1, no. 3, pp. 62–70, 2013.
[27] Y. Ramdhani and D. Riana, “Hierarchical Decision Approach Based on Neural Network and Genetic Algorithm Method for Single Image Classification of Pap Smear,” Second International Conference on Informatics and Computing (ICIC), pp. 1–6, 2017, [Online]. Available: doi: 10.1109/IAC.2017.8280587.
[28] R. E. McRoberts, S. Magnussen, E. O. Tomppo, and G. Chirici, “Parametric, Sample Bootstrapping, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data,” Remote Sensing of Environment, vol. 115, no. 12, pp. 3165–3174, 2011, doi: 10.1016/j.rse.2011.07.002.
[29] T. Siswanto, “Optimalisasi Sosial Media Sebagai Media Pemasaran Usaha Kecil Menengah,” Liquidity, vol. 2, no. 1, pp. 80–86, 2018, doi: 10.32546/lq.v2i1.134.
[30] L. R. Zientek and B. Thompson, “Applying the Sample Bootstrapping to the multivariate case : Sample Bootstrapping component / factor analysis,” Behavior Research Methods, vol. 39, no. 2, pp. 318–325, 2007.
[31] H. lin Shang, “Resampling Techniques for Estimating the Distribution of Descriptive Statistics of Functional Data,” Communication in Statistics-Simulation and Computation, vol. 44, no. 3, pp. 614–635, 2015, [Online]. Available: doi: 10.1080/03610918.2013.788703.
[32] N. L. W. S. R. Ginantra, “Deteksi Batik Parang Menggunakan Fitur Co-Occurence Matrix Dan Geometric Moment Invariant Dengan Klasifikasi KNN,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 7, no. 1, p. 40, 2016, doi: 10.24843/lkjiti.2016.v07.i01.p05.
[33] M. Hasanipanah, M. Noorian-Bidgoli, D. Jahed Armaghani, and H. Khamesi, “Feasibility of PSO-ANN model for predicting surface settlement caused by tunneling,” Engineering with Computers, vol. 32, no. 4, pp. 705–715, 2016, doi: 10.1007/s00366-016-0447-0.
[34] D. Kristianto, C. Fatichah, B. Amaliah, and K. Sambodho, “Prediction of Wave-induced Liquefaction using Artificial Neural Network and Wide Genetic Algorithm,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 8, no. 1, p. 1, 2017, doi: 10.24843/lkjiti.2017.v08.i01.p01.
[35] D. Graupe, Principles of Artificial Neural Networks (2nd Edition), vol. 53, no. 9. University of Illinois, Chicago,USA, 2007.
[36] R. Apurb, S. Milan, A. Avi, and R. Dundigalla, “Heart disease prediction using machine learning classifiers,” International Journal of Advanced Science and Technology, vol. 29, no. 6, pp. 1700–1707, 2020, doi: 10.37200/IJPR/V24I6/PR260661.
[37] S. Yahdin, A. Desiani, N. Gofar, K. Agustin, and D. Rodiah, “Application of the Relief-f Algorithm for Feature Selection in the Prediction of the Relevance Education Background with the Graduate Employment of the Universitas Sriwijaya,” Computer Engineering and Applications (ComEngApp), vol. 10, no. 2, pp. 71–80, 2021.
[38] J. A. Saez, J. Luengo, J. Stefanowski, and F. Herrera, “SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering,” Information Sciences, vol. 291, pp. 184–203, 2015.
[39] T. Arifin and R. Rachman, “Optimasi Decision Tree Menggunakan Particle Swarm Optimization Untuk Klasifikasi Sel Pap Smear,” (JATISI) Jurnal Teknik Informatika dan Sistem Informasi, vol. 7, no. 3, pp. 572–579, 2020.