Dissolved Oxygen Prediction of the Ciliwung River using Artificial Neural Networks, Support Vector Machine, and Streeter-Phelps

Abstrak Evaluasi kualitas air sungai


Introduction
Water is a compound that has a very important role in human life. Water is used to meet various needs such as household, industrial and other needs. The need for water will increase along with the increase in population. However, an increase in population will result in a decrease in water quality due to various human activities [1]. The decline in water quality is also caused by the contamination of water sources by various kinds of domestic waste and industrial waste [2]. Water quality management is an effort to maintain water in order to obtain the required water quality according to its designation to ensure water quality in its natural state [3].
One source of water that is often used is the river. Rivers have an important role in people's lives. About 76% of water from the river is used to meet household needs [4]. The poor quality of river water will have an impact on reducing the number of river biota which will further reduce the quality of river water in the downstream areas and will eventually lead to the sea [5]. Thus, water must meet the requirements related to water quality according to its designation as JURNAL ILMIAH MERPATI VOL. 10, NO. 3 DECEMBER 2022 p-ISSN: 2252-3006 e-ISSN: 2685-2411 Dissolved Oxygen Prediction of the Ciliwung River using Artificial Neural Networks, Support Vector Machine, and Streeter-Phelps (Yonas Prima Arga Rumbyarso) stipulated in the quality standard. The Ciliwung River Basin (DAS) has a strategic function in a national context, so it needs to be managed specifically. The length of the Ciliwung river from upstream to downstream in Jakarta Bay is ± 117 km with a catchment area of 347 km2. The Ciliwung watershed covers the upstream area in the Puncak area, Bogor Regency, to the downstream area in Jakarta Bay [6]. The damage that occurred to the Ciliwung river was caused by human activities in the vicinity of the Ciliwung watershed. Various policies that did not pay attention to environmental aspects were allegedly one of the factors causing the destruction of the Ciliwung watershed. This is then exacerbated by the presence of industrial waste in the middle segment of the Ciliwung watershed and the presence of domestic waste from people who live along the Ciliwung River which further degrades the quality of Ciliwung river water [7]. Efforts that can be made in evaluating the quality of Ciliwung river water is to analyze the distribution of dissolved oxygen or dissolve oxygen (DO) in the Ciliwung river. Oxygen (O2) is essential for life and a critical constraint on aquatic ecosystems [8] [9]. DO is one of the important parameters in determining water quality because it can indicate the level of pollution in a water. DO concentration in waters reflects the equilibrium of oxygen production and oxygen consumption processes. This process is highly dependent on various factors such as temperature, salinity, oxygen depletion, oxygen source and other water quality parameters [10]. Water quality is influenced by various factors which in fact have quite complicated non-linear relationships with various variables such as traditional data processing methods which are no longer feasible to solve the problem. Several things such as speed, accuracy, time efficiency and cost are also some of the factors that can influence river water quality monitoring. Thus it is necessary to have the latest innovations in monitoring water quality in the form of using artificial intelligence in monitoring water quality [11]. The purpose of this research is to analyze the environmental parameters that influence the distribution of dissolved oxygen, to conduct prediction modeling to predict the distribution of dissolved oxygen in the Ciliwung River using an artificial neural network (ANN) and a support vector machine (SVM), and to analyze the differences in the results of modeling the distribution of dissolved oxygen. between ANN, SVM and the Streeter-Phelps model. The results of this study can provide information regarding the quality of river water as a source of water used in daily activities.

Research Method and Materials
The stages of this research are presented simply in the form of a flowchart in Figure 1. This research uses the Ciliwung River water quality dataset for modeling dissolved oxygen using ANN. The variables used are variables affecting the value of dissolved oxygen in river water, in the form of power of hydrogen (pH), temperature, turbidity, chemical oxygen demand (COD) and biochemical oxygen demand (BOD). The dataset used is a primary dataset obtained from research by Astono [12], Hendrawan [13], Rahman [14], Soewandita [7], and secondary data obtained by researchers. The data used is 55, the sample dataset can be seen in Table 1. The dataset that will be used as ANN and SVM learning data consists of six parameters which are divided into input and output data. In building the model, it is necessary to analyze the correlation between parameters. The purpose of the correlation analysis is to find out whether the relationship between these parameters is positive or negative. Before modeling the preprocessing stage is carried out to overcome the presence of missing values in the dataset. Missing value in the dataset is 28.8%. The approach used is machine learning, which uses the SVM, ANN and linear regression algorithm. For DO prediction modeling compare ANN, SVM and Streeter-Phelps. Calculations in the Streeter-Phelps modeling are the deoxygenation constant and the re-aeration constant. Calculations on both constants are carried out with empirical formulas related to hydraulic parameters, namely velocity and depth of flow. Model performance is assessed based on the value of the coefficient of determination (R2) and the root mean square error (RMSE) value.

Literature Study 3.1. Water quality
Water quality is a term to describe the condition of water that will be used for its designation such as drinking water, fisheries, irrigation, industry, and so on [15]. Water quality includes three characteristics, namely physics, chemistry and biology. Several water quality parameters include pH, color, electrical conductivity temperature, chemical substance concentration, bacterial concentration, and so on. The following are some of the parameters that affect water quality: a. Dissolved Oxygen Dissolved Oxygen (DO) is the amount of oxygen dissolved in a water. DO is needed by all microorganisms in the waters to be used in respiration processes, metabolism so that microorganisms can produce energy for growth and reproduction [16]. The amount of oxygen needed by microorganisms is highly dependent on the amount of water and the type of organic matter contained in the waters. Therefore, the entry of organic waste from household, industrial, mining and agricultural activities will reduce O2 levels in water [17]. Thus, the higher the DO value contained in a water, the better the water quality. b. Power of Hydrogen Power of Hydrogen (pH) or degree of acidity is the intensity of acidity or alkalinity of a liquid and represents the concentration of hydrogen ions in a solution. The degree of acidity is an important parameter in analyzing water quality. This is because pH has a significant influence on the biological and chemical processes in it [18]. Water that comes from the mountains usually has a high pH. The pH value of this water will decrease as the water flows from the mountains to the downstream. This is due to the addition of organic matter which is able to release CO2 so that the pH of the water will decrease. The pH value greatly affects the physical, chemical and biological processes of organisms that live in waters. The pH value greatly influences the toxicity of polluting materials and the solubility of some gases, and determines the form of substances in water [19]. c. Biochemical Oxygen Demand Biochemical Oxygen Demand (BOD) is the amount of dissolved oxygen needed by microorganisms living in the aquatic environment to decompose or degrade organic waste substances contained in the aquatic environment. BOD serves to measure the amount of oxygen used by microbial populations in waters in response to the entry of organic matter that can be decomposed [20]. The BOD value is expressed in milligrams of oxygen per liter. Chemical Oxygen Demand (COD) is the amount of oxygen needed to decompose all organic matter contained in water and the COD value can be obtained by the oxygen value needed to decompose all organic matter in waters. This is because in the process of determining the value of COD, the chemical potassium bichromate is used in acidic and hot conditions with a silver sulfate catalyst so that organic matter that is easily decomposed or that is difficult to decompose will be oxidized [21]. e. Temperature Temperature is a measure or degree of hotness or coldness of a system or object [22]. Temperature is one of the factors that can affect chemical reactions and biological activity in waters. Temperature plays a very important role in controlling the condition of aquatic ecosystems, especially on the survival of an organism. An increase in temperature in water bodies can result in a decrease in the amount of dissolved oxygen in the water, an increase in the speed of chemical reactions and the life of fish and other aquatic animals becomes disrupted [23]. Increasing temperature also causes an increase in the decomposition of organic matter by microbes. In addition, river water temperature is a limiting factor for aquatic organisms [24]. Thus, changes in surface temperature can affect the physical, chemical and biological processes that occur in these waters [25]. f. Turbidity Turbidity is the amount of granular substance contained in water that cannot be seen by the naked eye. Turbidity in water is not part of the water's harmful properties. However, turbidity can cause fear of the presence of chemical compounds in water that can endanger life. The level of water turbidity is commonly referred to as turbidity. Turbidity in waters is generally caused by the presence of suspended particles such as clay, silt, dissolved organic materials, bacteria, plankton and other organisms. This turbidity level is usually referred to as nephelometric turbidity units (NTU). According to WHO (1998), the level of turbidity in drinking water has a maximum limit that meets the requirements of 5 NTU.

ANN and SVM modeling
Artificial Neural Network (ANN) is an information processing system inspired by biological neural networks in the human body so that Artificial Neural Networks have characteristics similar to biological neural networks in humans [26]. ANN is also known as a general form of mathematical modeling in human biological neural networks [27]. ANN is often used in the development of predictive models. This is because ANN can model quite complex problems which are very difficult to model in the form of mathematical equations [28]. ANN has the ability to recognize and study the relationship between system inputs and outputs without paying attention to their physical form explicitly [29]. The ANN architecture can be seen in Figure 2.
Modeling a problem by ANN can be done with the backpropagation method. The ANN backpropagation method is an ANN technique using a forward and backward learning system that is based on an error backpropagation algorithm with error correction [30]. This network consists of various layers. The input layer will be connected to the hidden layer. The hidden layer will be connected to the output layer. When given an input pattern, the pattern will go to the hidden layer and be forwarded to the output layer. If the output results are not as expected, then the output will be propagated backwards to the hidden layer and then to the input layer [31]. Dissolved Oxygen Prediction of the Ciliwung River using Artificial Neural Networks, Support Vector Machine, and Streeter-Phelps (Yonas Prima Arga Rumbyarso) Support Vector Machine (SVM) is a method in supervised learning which is usually used for classification (Support Vector Classification) and regression (Support Vector Regression (SVR)). The SVR algorithm is a theory adapted from machine learning theory that has been used to solve classification problems, namely SVM. This SVR is the application of the SVM algorithm in the regression case. In the SVM method is the application of machine learning theory to classification cases that produce integer values, while the SVR algorithm is the application of regression cases which produce output in the form of real numbers [32]. The concept of the SVR algorithm can produce good forecasting values because SVR has the ability to solve overfitting problems. Overfitting is data behavior during the training or training phase resulting in almost perfect prediction accuracy. The goal of the SVR algorithm is to find a dividing line or it can be called the best hyperplane. The best hyperplane can be found by measuring magin with that hyperplane. Margin itself is the distance from the hyperplane to the closest data. The closest data from the margin is called the support vector [33].

Streeter-Phelps modeling
The Streeter-Phelps model is a model for determining the carrying capacity of water pollution loads by applying a mathematical calculation approach to the mass balance of a water by assuming one dimension and in steady state. Streeter-Phelps modeling uses the oxygen sag curve equation [34]. Streeter-Phelps modeling is generally limited to discussing only two phenomena, namely: a. The process of reducing dissolved oxygen (deoxygenation) due to bacterial activity in degrading organic matter in water. b. The process of increasing dissolved oxygen (reaeration) caused by turbulence that occurs in river flow.

Performance evaluation
Model performance is assessed based on the value of the coefficient of determination (R 2 ) and the root mean square error (RMSE) value. The value of R 2 > 0.7 indicates that a model is very good, whereas if the value of R 2 <0.4 then the prediction model should not be used.

Correlation analysis of water quality parameters
The results of the correlation analysis of water quality parameters show that water quality parameters such as dissolved oxygen and temperature are some of the water quality parameters that often get attention because these parameters can reflect water quality [14]. Dissolved oxygen, temperature, and pH are key parameters of water quality that can control the distribution of organisms in the waters. Minimum concentration limits and the role of dissolved oxygen in aquatic ecosystems indicate the ability of water bodies to adapt to the presence of pollutants. Dissolved oxygen variable is a valid indicator in measuring the level of water quality in a body of water which is closely related to pollutants. The degree of acidity has a non-linear relationship to dissolved oxygen levels. Dissolved oxygen levels that are quite high occur in the temperature range of 6.5-8.0 indicating that when the pH value is close to 7, the value of dissolved oxygen levels reaches the highest value. However, when the pH value exceeds 7, the value of dissolved oxygen levels obtained gradually decreases. Therefore, normal water pH has a great potential to contain a lot of oxygen in it.
Dissolved oxygen levels are quite high obtained at a temperature range of 20-25°C. This indicates that when water has a temperature close to 25°C, a fairly high level of dissolved oxygen is obtained. However, when the water temperature exceeds 25°C, the dissolved oxygen levels obtained will become smaller. Water temperature plays an important role in maintaining the sustainability of aquatic ecosystems. Kekeryhan has an inverse relationship, this is still relevant in the value range 0-110 NTU. Dissolved oxygen levels are quite high obtained in the turbidity value range of 0-50 NTU. This indicates that the clearer the water, the higher the dissolved oxygen content. Turbidity can affect the ability of water to transmit sunlight into the waters. By reducing the intensity of sunlight entering the waters, the process of photosynthesis by phytoplankton can be hampered.

Data Preprocessing
In the dataset, it can be seen that the parameters experiencing missing values are temperature, turbidity, BOD, and COD. To overcome missing values, machine learning techniques are carried out using the SVM, ANN and linear regression (LR) algorithms. The three algorithms were compared and the algorithm with the best performance was selected for the prediction of missing values in the parameters. For prediction of temperature parameters the best algorithm is LR with R 2 of 0.645, for prediction of turbidity and BOD parameters using the best algorithm is SVM and for prediction of COD parameters the best algorithm is ANN. Table 2 presents the results of the comparison of the three algorithms in parameter prediction and the results of filling in the missing values with the selected algorithm for each parameter can be seen in Table 3.

ANN and SVM modeling
In building prediction models using ANN and SVM algorithms. In ANN modeling using a model consisting of 3 layers, namely input, hidden layer, and output. In the hidden layer compared to the use of 1 hidden layer with 100 neurons and 2 hidden layers with 100 neurons each. To activate it, use Relu and the solver is Adam. Whereas SVM uses a linear kernel and both with 1000 iterations. The splitting of training data and testing data is 80 and 20. The results of the comparison of ANN and SVM can be seen in Table 4. The results of the prediction of dissolved oxygen using SVM and ANN modeling show that SVM outperforms ANN with 2 hidden layers and 1 hidden layer. SVM has an RMSE of 0.101 and an R2 of 0.998.

Comparison of ANN, SVM and streeter-phelps modeling
The most frequently used water quality modeling method is the Streeter-Phelps modeling method. Streeter-Phelps modeling is carried out using various test model equations which have differences in each test model. Analytical calculations using the Streeter-Phelps model are carried out based on hydraulic factors that affect dissolved oxygen levels. Several hydraulic parameters that can affect dissolved oxygen levels are distance, depth, and speed at each point/segment.
Streeter-Phelps modeling is done by performing a series of calculations related to the deoxygenation constant and the re-aeration constant. The two constants are calculated using river hydraulics parameters. The deoxygenation constant is calculated using an empirical formula that takes into account situational factors in the field such as river depth. The depth of the river greatly affects the level of dissolved oxygen. The deeper the river, the less concentration there is so that the fewer microorganisms that live in the river. The ability to rearate can increase dissolved oxygen levels in water because oxygen from the atmosphere diffuses with water. A large value of the reaeration constant will result in more oxygen diffusing into the river, resulting in high dissolved oxygen levels and reducing the potential for oxygen deficit.
The results of the calculation of the deoxygenization constant and the reaeration constant are then used in constructing a dissolved oxygen model using the Streeter-Phelps modeling method. The Streeter-Phelps mathematical model equation used is listed in Table 5. After the model is formed, then an analysis of the accuracy of the model is carried out. In testing the comparison of the ANN, SVM and Streeter-Phelps models, 10 datasets were used. The results of the Streeter-Phelps modeling of all equations can be seen in Table 6. The level of accuracy of the model was carried out using several statistical tests, namely the coefficient of determination, the bias factor and the RMSE value. The test value of the Streeter-Phelps model statistical analysis is presented in Table 7. It can be seen that the best model uses the 9th equation, namely BI with an RMSE value of 0.114. And the results of the comparison of the three DO prediction models can be seen in Table 8. From the 10 datasets tested using 3 dissolved oxygen prediction models, the Streeter-Phelps RMSE value is 0.114, the RMSE from SVM is 0.110, and the RMSE from ANN is 0.771.

Conclusion
After analyzing the relationship between water quality and dissolved oxygen, prediction models to estimate DO distribution in the Ciliwung River were compared, namely SVM, ANN, and Streeter-Phelps. The machine learning approach was chosen to overcome the missing value of 28.8%. After preprocessing the SVM and ANN modeling datasets, both of them produced quite good performance with R2 of 0.998 and 0.961. Streeter-Phelps modeling has various empirical equations in predicting DO, the use of the model developed by Baechelor and Lazo has an R2 value of 0.887. The results of the comparison of the three models show that SVM has the most superior performance compared to ANN and Streeter-Phelps, with an RMSE of 0.110. Previous research stated that ANN has better performance compared to Streeter-Phelps [14].