Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

  • Ade Jamal Universitas Al-Azhar Indonesia
  • Annisa Handayani
  • Ali Akbar Septiandri
  • Endang Ripmiatin
  • Yunus Effendi


Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


Download data is not yet available.


[1] O. L. Mangasarian, “Cancer Diagnosis via Linear Programming" SIAM News, vol. 23, no. 5, p. 1-18, 1990.
[2] R. Jain and A. Abraham, “A Comparative Study of Fuzzy Classification Methods on Breast Cancer Data” Australasian Physics & Engineering Sciences in Medicine, Vol. 27, no. 4, p. 213-218, 2004.
[3] E. D. Ubeyli, “Implementing Automated Diagnostic Systems for Breast Cancer Detection” Expert System with Applications, Vol. 33, no. 4, p. 1054-1062, 2007.
[4] I. Muhic, “Fuzzy Analysis of Breast Cancer Disease Using Fuzzy C- Means and Pattern Recognition” Southeast European Journal of Soft Computing, vol. 2, no. 1, p. 50-55, 2013.
[5] C. P. Utomo, A. Kardiana and R. Yuliwulandari, “Breast Cancer Diagnosis Using Artificial Neural Networks with Extreme Learning Techniques” International Journal Advanced Research in Artificial Intelligence, vol. 3, no. 7, p. 10-14, 2014.
[6] A. Handayani, A. Jamal and A. A. Septiandri, “Evaluasi Tiga Jenis Algoritme Berbasis Pembelajaran Mesin untuk Klasifikasi Jenis Tumor Payudara” Jurnal Nasional Teknik Elektro Teknologi Informasi vol. 4, no. 4, p. 394-403, 2017.
[7] A. Fallahi and S. Jafari, “An Expert System for Detection of Breast Cancer Using Data Preprocessing and Bayesian Network” International Journal of Advanced Science and Technology, vol. 34, p. 65-70, 2011.
[8] A. Aloraini, "Different Machine Learning Algorithms for Breast Cancer Diagnosis," International Journal of Artificial Intelligence & Applications (IJAIA), vol. 3, no.6, p. 21-30, 2012.
[9] K. Sivakami and Nadar Saraswathi, "Mining Big Data: Breast Cancer Prediction using DT - SVM Hybrid Model," International Journal of Scientific Engineering and Applied Science (IJSEAS), vol. 1, no. 5, p.418-429, 2015.
[10] K. Menaka and S. Karpagavalli , "Breast Cancer Classification using Support Vector Machine and Genetic Programming," International Journal of Innovative Research in Computer and Communication Engineering, vol.1, no. 7, p. 1410-1417, 2013.
[11] M. U. Ali, S. Ahmed, J. Ferzund, A. Mehmood and A. Rehman, “Using PCA and Factor Analysis for Dimensionality Reduction of Bioinformatics Data” International Journal of Advanced Computer Science and Applications, vol. 8, no. 5, p. 415-426, 2017.
[12] M. M. Al-Anezi, M. J. Mohammed and D. S. Hammadi, “Artificial Immunity and Feature Reduction for Effective Breast Cancer Diagnosis and Prognosis” International Journal of Computer Science Issue, vol. 10, no. 3, p. 136-142, 2013.
[13] R. R. Janghel, R. Tiwari, R. Kala and A. Shukla, “Breast cancer data prediction by dimensionality reduction using PCA and adaptive neuro evolution” International Journal of Information Systems and Social Change, vol. 3, no. 1, p. 1-9, 2012.
[14] K. Gupta and R. R. Janghel, “Dimensionality Reduction-Based Breast Cancer Classification using Machine Learning” Computational Intelligence: Theories, Application and Future Directions (Advances in Intelligent System and Computing ), vol. 1, editors N. K. Verma and A. K. Ghosh, Springer Nature Singapore Pte Ltd., p. 133-146, 2019.
[15] T. Yuan, W. Deng, J. Hu, Z. An, and Y. Tang, “Unsupervised Adaptive Hashing based on Feature Clustering” Neurocomputing, vol. 323, p. 373-282, 2019.
[16] T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System” in KDD'16 Proceedings of the 22nd ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, California, 2017, p. 785-794.
[17] D. Napoleon and S. Pavalakodi, “A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Sets”, International Journal of Computer Applications, vol. 13, no. 7, p. 41-46, 2011.
[18] D. Rusjayanthi, “Identifikasi Biometrika Telapak Tangan Menggunakan Metode Pola Busur Terlokalisasi, Block Standar Deviasi, dan K-Means Clustering” Lontar Komputer, vol. 4, no. 2, p. 265-276, 2013.
[19] M. Khan, “KMeans Clustering for Classification” Towards Data Science, 7 Aug. 2017 [online], Available: https://towardsdatascience.com/kmeans-clustering-for-classification-74b992405d0a [Access 10 Oct. 2018]
[20] Arif Habib, Meshiel Alalyani, I Hussain Musa and M. S. Almutheibi, “Brief review on Sensitivity, Specificity and Predictivities” IOSR Journal of Dental and Medical Sciences (IOSR-JDMS), vol. 14, no. 4, p.64-68, 2015.
How to Cite
JAMAL, Ade et al. Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, [S.l.], p. 192-201, dec. 2018. ISSN 2541-5832. Available at: <https://ojs.unud.ac.id/index.php/lontar/article/view/42796>. Date accessed: 31 mar. 2023. doi: https://doi.org/10.24843/LKJITI.2018.v09.i03.p08.

Most read articles by the same author(s)