Sales Forecasting Using Random Forest Regression with Particle Swarm Optimization on Superstore Sales
Abstract
In the current digital era, the ever-increasing volume of data highlights the significance of Big Data, encompassing data with large scale, variety, and complexity that poses challenges in storage, analysis, and visualization. Accurate sales forecasting, crucial in a competitive and dynamic business environment, provides critical insights for companies across various sectors. Tree-based machine learning algorithms, such as Random Forest and Gradient Boosting, are popularly employed for this purpose. Method optimization becomes essential to enhance the quality of results and relevance to the utilized data. Particle Swarm Optimization (PSO) is one technique that can be employed for this purpose. The Random Forest method is susceptible to overfitting, posing a major challenge in its usage. This research evaluates the performance of the Random Forest Regression algorithm optimized using Particle Swarm Optimization (PSO) for Superstore sales forecasting, compared to Grid Search and Randomized Search. The PSO-optimized model achieved an error value of 187.68 on the entire training data and 254.32 on the entire testing data. Grid Search has an error value that is superior to PSO but is not too significant. Apart from that, PSO has advantages compared to other optimization algorithms in that it has the shortest optimization time, namely 40 minutes 42 seconds, compared to Grid Search which takes up to 710 minutes 31.5 seconds and PSO allows users to easily tune the hyperparameters of the optimization algorithm that allows users to get better optimization results. Black box testing demonstrates that the system functions effectively according to user requirements.