Comparative Analysis of SVM and CNN for Pneumonia Detection in Chest X-Ray

Recognizing pneumonia can be done by analyzing chest X-rays. Pneumonia sufferers experience pleural effusion, fluid between the lungs’ layers. It causes the lungs’ X-ray picture to be cloudy. It differs from the X-rays on normal lungs, which are dark. This difference is the characteristic of the data so that it can be classified. Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) were employed in this study to identify pneumonia in X-ray images. SVM optimizes the hyperplane to separate data classes, while CNN uses convolution and pooling layers to learn patterns in the image. The data are obtained from General Hospital Ganesha Gianyar Bali and research by J.P. Cohen et al. CNN has several capabilities, such as automatic feature extraction, divided parameters, position invariance, and good generalization, so that it can classify limited data. This research applied Principal Component Analysis (PCA) and Wavelet Transformation to support both methods. The PCA-SVM model gave the best performance. The SVM model outperforms the CNN model in recognizing images; in this case, it could be due to the relatively small amount of training data.


Introduction
Chest radiography with posteroanterior and lateral views is the imaging examination for evaluating typical bacterial pneumonia, known as chest X-ray [1].Pleural effusion, a disease where fluid fills the lungs and makes breathing difficult, is brought on by pneumonia.Chest X-ray imaging is the most commonly utilized technique when diagnosing pneumonia [2].Lungs infected with pneumonia show an unusual white or hazy haze/shadow on X-ray images, whereas this area is usually dark in normal lungs [3].This is because the machine sends short waves of X-ray radiation to scan the organs in the body during an X-ray examination.The radiation absorbed by each part of the body can vary, depending on the density of the part.
Machine learning and deep learning are prevalent and well-performing methods used in previous studies to detect pneumonia from chest X-ray images.This research aimed to determine the performance of the machine learning method represented by Support Vector Machine (SVM) and compare it with the deep learning model performance represented by Convolutional Neural Network (CNN) to detect pneumonia from chest X-ray images.In machine learning, CNN and SVM are widely used techniques often applied to classification work [4], [5].The research also aimed to determine the impact of using wavelets and Principal Component Analysis (PCA) in improving the performance of SVM and CNN in detecting pneumonia from chest X-ray images.SVMs' ability to handle high-dimensional data, like photos, is one of their critical advantages in image categorization [6], [7].Compared to other algorithms like neural networks, SVMs also have a lower overfitting rate.Robust supervised learning (SVM) performs well on complex but smaller datasets [8].In cases where there are more dimensions than samples, SVM works well.SVM requires a shorter computing time than CNN in a limited amount of data [9].
CNNs are usually better than SVMs for image classification because they can learn more complex features from images.CNNs are specifically designed to extract features from images, while SVMs are more general classifiers.SVM is unsuitable for large datasets due to its long training time.This is because the size of the dataset greatly influences the complexity of SVM training [10].CNNs are generally preferred over SVMs for image classification due to their ability to learn relevant features automatically.The choice of machine learning or deep learning for image classification depends on factors such as data availability, feature complexity, computing resources, and desired level of performance.In many cases, deep learning, especially CNNs, has often demonstrated superior performance on large and complex image datasets.
The specific task and dataset determine whether to use CNN and SVM.SVM is still a good choice in some situations, especially when working with smaller datasets or when interpretability is essential.But in general, CNNs are the preferred choice by many machine learning practitioners because they can handle complex data, learn from unprocessed input, and achieve higher levels of accuracy.
SVM is a well-reported method for image recognition in the healthcare domain [11], [12].In medical image recognition tasks, SVM outperforms several machine learning methods, such as KNN and Random Forest [13], [14].Complex medical image recognition can be supported by dimensionality reduction methods such as PCA.The increase in image classification performance due to using PCA and machine learning is supported by research results [15]- [18].The success of image detection is also supported by good feature extraction in medical images; in this case, the wavelet method is used to overcome this.The success of using wavelets to support increasing image classification performance using machine learning was reported by research [16], [19], [20], [11].
The CNN method is suitable for spatial domains [22].Like image recognition using machine learning, PCA has also been reported to provide performance support in image recognition using deep learning methods.PCA support for improving CNN performance in image recognition was reported by research [23]- [27].Even though the CNN method is an image recognition method that can be used without a feature extraction process first, several researchers have also used wavelets in conjunction with CNN.Research has reported that Wavelet supports image recognition using CNN [28].
The SVM method is reported to outperform the CNN method in image classification.Research conducted by [29] carried out hyperspectral image classification by comparing two methods, namely machine learning and deep learning.The machine learning method is SVM, while the deep learning method is CNN.In addition, the PCA technique is also used to reduce high dimensions, noise, and information redundancy in image data.The SVM kernels used are RBF and Linear.This research obtained the highest accuracy results when using the SVM-RBF method on the Hyperspec-VNIR Chikusei dataset with an accuracy of 98.84%.Meanwhile, research [30] reported that the CNN method is superior to the CNN-SVM method for classifying Human Skin Disease.Research [31] also reported that CNN outperformed SVM in flower image classification.
Three previous studies used the same data, namely research by [20], which found that Wavelet Transform and SVM could perform well in classifying images of lungs infected with COVID-19.
Research by [19] continued this research and obtained results in the form of a wavelet variant that provided the best performance in classifying images of lungs infected with COVID-19.These two studies only examined the performance of Wavelet Transform and SVM, so in [32]'s study, the classification of COVID-19-infected lungs was carried out with CNN, producing good performance with a three-convolution layer architecture.The third research shows no understanding regarding Wavelet Transform support for the CNN method.This research also wants to know about PCA support for SVM and CNN methods.For this reason, we carried out this research.The datasets used had different shapes or were not the same, so several pre-processing stages were carried out before entering the feature extraction process in the image.Several processes were conducted, including converting the image to a grayscale image, resizing the image to make it the same size as 160x160 pixels (resize), and cropping the image to remove unnecessary areas of the image (cropping) and focus on the part to be classified.

Wavelet Transform
As feature extraction, Wavelet Transform analyzed moving signals to obtain spectrum and time information simultaneously [34].One part of the Wavelet Transform was the Discrete Wavelet Transform (DWT).In wavelet decomposition, a single wave called the mother wavelet determines the wavelet decomposition and can be called a bandpass filter.The DWT results showed four sub-bands: approximation coefficient and detail coefficient.The detailed wavelet components were produced through high-pass and low-pass filters [28].Detail coefficients consisted of vertical, horizontal, and diagonal coefficients.
This research used Discrete Wavelet Transform (DWT) to extract image features.DWT had various wavelet families, such as Haar, Daubechies, Biorthogonal, Coiflets, and Symlets.This research would use one of the wavelet variations, namely Daubechies (db2).It was based on research by [19] showing that Daubechies provided the best accuracy results compared to other variations.Daubechies calculated the running average and the difference using a scalar product [35].Introduced by Ingrid Daubechies, Daubechies had unique characteristics: it was formed from the degree of the polynomial, the number of missing moments, and the length of the filter coefficient used [36].In this research, wavelet transformation was only carried out up to level 1 decomposition.Wavelet transform decomposed the image into ¼ of the original image, so the resulting image in this research would be 80x80 pixels.

PCA
The feature helped in achieving high accuracy.However, as the number of features increased, the complexity of the model and computing time would also increase [26].PCA was a method used to transform variables that correlate with smaller quantities.In addition, PCA was used for several purposes, such as helping to find relationships between dimensions, helping in feature extraction or extracting information from data, and reducing large dimensions to smaller ones [18].PCA was also often used to overcome feature duplication problems in data [37].Image data was high-dimensional, so PCA was effectively used for feature extraction [37].Finding a more straightforward space for high-dimensional data could be done by determining the eigenvector of the covariance matrix in the data.The best or most influential eigenvectors were obtained from the largest eigenvalues, and these vectors were called principal components [38].The value of the important principal component was the same as the value of the eigenvalues of the correlation matrix, and the value was also greater than one [39].In this research, the component values used would be applied to each method, namely 20, 50, and 100.

CNN
CNN was a Multi-Layer Perceptron (MLP) invention, but CNN was more often used for image cases [22].The CNN method had two stages: feature learning using convolution and classification for the image classification process.This deep network comprises an input, output, and hidden layer [25].
CNN is a machine-learning technique that extracts hierarchical features from image data using convolution and pooling layers.It is beneficial for small data due to its efficiency in parameter usage.CNN uses split parameters, requiring only a small number of parameters to recognize local patterns, making it more efficient.It also has invariance to spatial shifts, identifying the same pattern at various locations in the image, which helps with position or rotation variations in limited datasets.Furthermore, CNN can generalize well to new data, allowing it to apply general patterns to previously unknown data.Several studies [40] [41] examining using CNN to recognize images with small datasets show that CNN can recognize images without overfitting.
The convolution process had a convolution and down-sampling stage to perform feature learning from the input image.Then, it entered a classification called the fully connected layer, with a multilayer perceptron backpropagation process in the neural network [28].The convolution process multiplied two matrices, and the results were called a feature map.Then, down-sampling, namely the pooling layer, was used to reduce the image size [42].This research used maxpooling.Activation functions were utilized in CNN, namely ReLu and Sigmoid.Rectifier Linear Unit (ReLu) could solve the missing gradient problem.The sigmoid activation function was then employed at the classification stage for two-class classification.Some regularization techniques, such as L2 regularization, are used in this research to control model complexity.There was also dropout, a regularization technique to prevent overfitting [42].This research used L2 regularization with a size of 0.0001 and dropout with a length of 0.3.
In addition, data augmentation was often used for limited data to increase data variation in the training process.Data augmentation techniques used were flip, rotation, and shift.Before being trained, the model was compiled using an optimization technique, Adam, to maximize accuracy.Some hyperparameters, such as the learning rate, were sized to streamline the training process [43].The learning rate used in this research was 0.0001.Figure 4 is the CNN architecture used in this research.[44].In addition, the implementation of wavelet transform supports feature enhancement, thereby increasing the performance of the CNN model [45].Likewise, PCA also helps in dimension reduction so that the model process becomes faster without losing essential information from the image [46].

SVM
A machine learning technique called Support Vector Machine was typically applied to classification cases.The SVM classifier determined the maximum margin value between hyperplanes to separate classes [47].The linear class separator used the following formula [48].
From this equation, w is the weight vector, and b is the bias for determining the position of the hyperplane.SVM could map the input sample space to a high-dimensional feature space through "core mapping" so that SVM had the advantage of preventing overfitting and was superior for use in small datasets, high-dimensional, and nonlinearity [49].
SVM had various kernels that could be used for classification, such as RBF, linear, polynomial, and sigmoid kernels.The kernels used in this research were RBF and linear.RBF, or Radial Basis Function, was influential in image classification for leaf disease cases carried out by [50] because it produced higher accuracy than other methods.In addition, research by [4] compared four SVM kernels: linear, RBF, polynomial, and sigmoid.
Classification model testing was done using the 10-fold cross-validation method, which divides or splits the dataset into ten random samples [51].Ten-fold was a measure commonly used in testing [19].

Model Evaluation
The model that had been tested was measured to see the model's performance.One technique used to measure model performance was the confusion matrix.The confusion matrix was analyzed, and the model's effectiveness in classification was identified [52].For binary classification, there were four meters, namely True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).TP was positive data predicted to be positive, FP was negative data predicted to be positive, TN) was negative data predicted to be negative, and FN was positive data predicted to be negative.

Result and Discussion
The image is preprocessed before entering the model.After preprocessing, the results of the COVID-19 image are shown in Figure 6, which has been changed to grayscale, cropped, and resized.Where to validate using ten-fold cross-validation.

SVM Experiment Result
The first experiment showed that the SVM model with the RBF kernel provided better performance in recognizing pneumonia lung images, as shown in Table 1.Although the accuracy was the same, the SVM F1 Score value with the RBF kernel provided higher performance, which means the model's ability to recognize pneumonia lung and normal lung images was more balanced.Then, the confusion matrix produced by the best model in this first experiment, namely SVM-RBF, is shown in Table 2. Researchers referred to the results of the wavelet-SVM model experiment [53], which found that the Daubechies wavelet variant performs best when used with the SVM model to recognize COVID lung images.For this reason, researchers presented the results of this experiment through the graph shown in Table 3.The linear SVM model performed best with sub-band approximation, balancing accuracy, and F1-Score.Experiments using the PCA-SVM model varied based on several components.The number of components in PCA indicated the size of the components representing the data after the data dimensionality reduction process.Table 5 below shows that the Linear SVM model with 100 components gives the best accuracy and F1 Score results.The difference in the number of components does not affect the model performance.Experiments showed that the data could be reduced to 20 components without significantly reducing SVM performance.The only significant difference in results was shown by variations in the RBF kernel with 100 components, where the SVM with the RBF kernel failed to recognize images with significant enough PCA components.

CNN Experiment Result
The following experiment measured the CNN model's performance according to the architecture described previously.Table 7 below presents the CNN model's performance in recognizing pneumonia lung images.Epoch 250 gave the best performance, but when the epoch was added to 300, the performance of the CNN model decreased drastically.Wavelet-CNN performance increased as the number of epochs increased, as shown in Table 9.
The highest accuracy and F1-Score were obtained at 250 epochs, where the model's performance dropped when the number of epochs increased to 300.As with the wavelet-SVM experiment, the Daubechies variant was used.The PCA-CNN experiment used 100 components, where the best performance for the model was obtained from a 250-epoch running configuration, as shown in Table 11.

Comparison Result
This research selected the best model from each experiment described previously-table 13 and Figure 9 present the full comparative results.For the data in this research, which amounted to 165 X-ray lung images with balanced classes, the performance of the PCA-SVM model with linear kernel SVM and PCA 100 components gave the best results, with an accuracy of 94.545% and F1-Score 94.675%, slightly higher than the wavelet-SVM model.The SVM model supported by PCA or wavelet provided better performance in recognizing images.
The opposite happened to the CNN model.PCA and wavelet support from the results of this experiment showed a decrease in performance.The stand-alone CNN model provided better performance.It was because when the wavelet transformation or PCA process occurred on the data, the data size changed to become smaller, and some data details were lost, where it was possible that the CNN model needed the missing data to recognize the image.As mentioned in the introduction above, it was confirmed that the CNN model did not require feature extraction from the data because it included feature extraction and weight computation during the training process.Based on the experimental results, the single CNN model performs better than the CNN model with PCA and Wavelet.This is because models involving wavelet transform or PCA can reduce feature dimensionality, which may reduce the capacity of the model and its ability to handle data complexity.CNNs can also better adapt to complex image data, mainly if some complicated patterns or features cannot be represented effectively by wavelet transforms or PCA.

Conclusion
In this research, it was found that the PCA-SVM with RBF kernel model provided the best performance.The same result where SVM with RBF kernel is superior to CNN is also supported by research [29].SVM was superior to CNN, presumably due to the research's limited image data, whereas CNN provided better performance if the training data was large enough.This fact is also supported by research results by [54], where when data is extensive, CNN is superior, but when the amount of data is small.SVM has a better performance than CNN.PCA and wavelet could enhance the SVM model's ability to identify lung images of pneumonia, but PCA and wavelet worsen the CNN model's performance.It could be caused by the image being more negligible, so several essential features needed by CNN were reduced.In future research, it is recommended that more training data be used so that CNN will likely provide better results than the SVM method.
This research was carried out through several stages.The research stages are made in the form of a flowchart shown in Figure1below.

Figure 1 .
Figure 1.Research Flowchart This research began with collecting pneumonia image data.Since pneumonia is a condition where COVID-19 attacks the lungs and triggers inflammation, this research took the lungs of COVID-19 sufferers to represent pneumonia lung data.The data collected totaled 165 images, including 82 images of lungs infected with COVID-19 and 83 images of normal lungs.The image was taken from two different sources: the COVID-19 lung image was taken from [33], and the normal lung image data was taken from Ganesha General Hospital, Gianyar, Bali.The collected datasets have been validated and verified by experts.All of this data is data from different individuals.The data gathered for COVID-19 detection will be split into training and testing sets.The remaining 10% of the data will be used for testing, and the remaining 90% will be used as training data.Therefore, 148 165 photos are used for training, while 17 are used for testing.Figure2 and Figure 3 below are COVID-19-indicated lung and normal lung images.

Figure 2 and
Figure 3 below are COVID-19-indicated lung and normal lung images.

Figure 4 .
Figure 4. CNN Architecture From this architecture, this research used three convolution layers with 32, 64, and 128 filters, respectively, 3x3 kernels, and max pooling with a size of 2x2.The classification layer consisted of a flattened process, a fully connected layer with several neurons of 256, sigmoid activation, and a dropout layer.Wavelet Transform is used before entering the CNN model even though there is feature extraction in the layer because Wavelet Transform can simplify the work of the CNN model due to wavelet decomposition[44].In addition, the implementation of wavelet transform supports feature enhancement, thereby increasing the performance of the CNN model[45].Likewise, PCA also helps in dimension reduction so that the model process becomes faster without losing essential information from the image[46].

Figure 6 .
Figure 6.Before and After Pre-Processing Image All measurement results in this research are model evaluation measurements from testing data.Where to validate using ten-fold cross-validation.

Figure 7 .
Figure 7. Performance Comparison Chart Between Models

Table 13 .
Model Performance Comparison ResultsIn this experiment, the SVM model performed better than the CNN model.The SVM model with PCA support provided an accuracy of 94.545%, whereas the CNN model provided an accuracy of 86.728%.It could be because CNNs had some drawbacks that limited their performance and applicability.CNNs' major drawback was that, to train efficiently, they needed a lot of labeled data.The training data of 148 images in this research could not support CNN in providing better performance than the SVM model.