XGBOOST DENGAN RANDOM SEARCH HYPER-PARAMETER TUNING UNTUK KLASIFIKASI SITUS PHISING
Phishing is a form of cyber crime that harms other people and includes acts that are against the law. There are several approaches to combating phishing crimes, one of which is by classifying phishing websites using machine learning methods. The dataset used is a phishing websites dataset from the UCI Repository with 11055 data and 30 categorical features. The classifier method used is XGBoost. XGBoost is good for classifying data with categorical features, but the performance of this algorithm can still be improved. To overcome these problems, researchers used a hyper-parameter tuning solution. XGBoost has several hyper-parameters that can be configured to improve the performance of the model. The problem of identifying good values for hyper-parameters is called hyper-parameter tuning. The hyper-parameter tuning method used is Random Search which is then validated using 5-Fold Cross Validation for 30 iterations. The configured XGBoost hyper-parameters include n_estimators, max_depth, subsample and learning_rate. Testing on XGBoost without hyperparameter tuning obtained an accuracy of 95.34%. Testing on XGBoost with hyperparameter tuning obtained an accuracy of 97.69%. Hyper-parameter tuning with Random Search on XGBoost for phishing websites classification provides improved model performance at an accuracy of about 2.35%.