Detection of Class Regularity with Support Vector Machine methods

One of the most factor that affects the achievement and learning motivation of students is a conducive classroom environment. It can be seen from the student's regularity in the class. Teachers can determine whether the class is adequate or not by monitoring the class condition through video. The research tries to apply the extraction of imagery and sound features by using the Centroid extraction method and the MFCC along with classifying the regular or irregular classrooms with the SVM methods which are taken by video installed in a classroom. The video will be split into image data and sound data. The process of image data starts with reading the input, then it goes to the stages of preprocessing, segmentation with K-Means, morphology, and the most important part is to get information before it is classified by the SVM method to get its class regularity. The sound frequency will be extracted by the MFCC method and then it is classified by the SVM method to get the class noise. The results of this research get an accuracy value of 78% in the linear kernel and 70% in the polynomial kernel. This research uses 50 test data consisting of 25 regular data and 25 irregular data taken directly through video recording. These results prove that the SVM method has given good classification results for regular and irregular classes.


Introduction
A classroom is a place for intensive teaching and learning activities. Students and teachers interact, give, and receive lessons in class, to achieve the objectives of national education. One of the factors that influence student achievement and motivation is a conducive classroom environment. A motivating environment will make it easier for students to accept lessons, in addition to be able to develop initiatives (the desire to learn on their own). Achievement of student learning achievement can be improved by evaluating the conditions of student learning activities through video recording media. Monitoring the condition of the classroom through video is also one application of the video that helps the teacher to review whether the class is conducive or not. Image and audio data is captured via video, where each information is processed based on its features. The level of regularity and class noise can affect student's motivation and learning achievement [1]. Therefore in this study, it is trying to apply image and sound feature extraction by using the Centroid Extraction and Mel Frequency Cepstral Coefficient (MFCC) method and classifying regular or irregular classrooms with the Support Vector Machine (SVM) method which is taken through video attached in a class. This research is a basic research that can be used to build a system which is called integrated smart class. One of them contains a feature of monitoring classroom conditions in real/through video (images and sounds). It aims to make it easier for teachers to monitor when the teacher isn't there in the class or the student studies independently.
Several previous studies related to the classification method using the Support Vector Machine (SVM) have been done. One of these was by I Gede Aris Gunadi and friends, with the title Fake Smile Detection Using Linear Support Vector Machine [2]. In this study, the detection of smiles from people's faces, whether smiles are real or fake by using the RoI (Region of Interest) segmentation technique, was done on the cheeks and eyes. The test results show that the accuracy of the system is 86%, while the error rate is 14%. Other research on SVM classification is a study conducted by Raudlatul Munawarah and friends with the title "Application of the Support Vector Machine Method in Hepatitis Diagnosis [3]. This study analyzes the ability of the SVM method to use training data of 100 positive and 100 negative data using linear kernel functions and RBF. 8 The percentage results of the classification using linear kernels are 68-83% and kernel RBF by 70-96%.
Research on the image of the classroom environment was carried out by researchers Takashi Ozeki and Watanabe, who made a study entitled Analysis of the Behavior of Students Considering Privacy [4]. This study uses the Haar classifier method for smoothed video. Then, check the pixel number of the skin color of the face area detected by this method, then each face is given a number. From the experiments, it was possible to determine the classification correctly when students faced forward even in smoothed videos. Research on image feature extraction has been carried out by Kadek Novar Setiawan and friends using the K-Means GLCM method in obtaining image feature extraction. The application of the k-means method is used in the segmentation process with 4 clusters. The GLCM method is used in the image extraction process, which aims to extract relevant information into the characteristics of each class. Support Vector Machine used as a classification process shows good results in distinguishing normal and abnormal mammogram images by showing an accuracy of up to 80%, so this method is considered good enough to be used in the classification process of mammogram images [5].
The research about extraction of sound features using MFCC has been researched by Awais, et al. Their research was using MFCC as extraction feature methods of speech signal with locality sensitive hashing (lsh) as its clarification method. The research got 92.66% accuracy values for the speech recognition process by matching the training data that it has [6]. Other research on sound or audio extraction using the MFCC method was conducted by Mohan B and Ramesh Babu N with the title Speech recognition using MFCC and DTW research. This study extracts sound features using the Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Wrapping (DTW) methods, two algorithms, each of which is adapted for feature extraction and pattern matching. Results obtained with one training and continuous testing phase [7].
Based on these studies, research about the detection system for the class regularity using the image and sound features with the Support Vector Machine (SVM) method has not been done yet. SVM is a machine learning method that is supervised learning which is still relied on in terms of binary classification and while this SVM method is not used yet in classifying object images and sounds in a classroom together. These two characteristics data are then modeled by the SVM Method as Training and Classifying, whether class conditions are regular or irregular. So it is hoped that This research can contribute in the form of class image datasets and audio class datasets because in the process of data acquisition, it is done directly by using the same data collection standards both from the tool and the angle of video data retrieval, which is separated into images and audio, and Hopefully this research can contribute references for other research in knowing the image and audio classification process by using the SVM method.

Figure 1
Overview of the method approach for class regularity detection Class regularity detection uses two inputs derived from images and audio. Each input must produce features that can be used by SVM classification methods to determine the conditions of regular or irregular classes. For example, for the image input, the hair position feature used as a feature, and the audio input use the value of the intensity of the sound frequency produced by students in the class as a feature.
The hair position feature is used as a feature value, assuming that if students pay attention in the class, they will regularly sit with the hair position will look regular if drawn in a straight line horizontally. Conversely, if students who are in the class do not focus ahead, of course, the position of each student's hair will look irregular. Characteristics of hair position in the study using the centroid value of each segmentation obtained. As for audio input, the characteristic value is taken from the intensity value of the sound frequency produced. Assume used is the higher the intensity of the sound frequency obtained from the input, the class tends to be irregular, and conversely, the lower the intensity of the sound frequency obtained, the condition of the class tends to be regular.
The detailed process of each input, image, and audio can be seen as follows.

Image Data
Image data is an image that is similar to its original form or at a minimum in the form of a planimetry. Images or digital images on a two-dimensional scale are processed and manipulated by the image processing method [8]. The image processing process in this study is seen in Figure 2. In the preprocessing process, the input image in the RGB color space is converted to HSV. This color model is in accordance with human perception of the similarity of colors [9]. The Gaussian blur filtering process is included in the preprocessing image, where the image is blurred and reduced the noise contained in it [10]. The next stage is image segmentation using K-Means. This study will look for students' hair objects using K-Means segmentation. Segmentation is a technique for dividing an image into several regions where each region has a similar attribute [11] [12]. K-Means is an unsupervised clustering algorithm and it is used to segment more prominent areas of the background [13]. K-Means can work well on image segmentation, if the image has previously been partially repaired [13]. Furthermore, the segmented image will be processed by image morphology into several steps. First is binrization, which changing the image in binary form, namely an image with two gray level values, black and whites [14]. Then, closing which smoothing the segmentation and cover the missing pixels . The last one is erosion, which moving pixels at object boundaries and opening refine object boundaries, separate objects that were previously hand in hand, and eliminate objects smaller than the size of the structuring [15] [16].  Audio data processing begins with feature extraction, which in this stage, a series of quantities in the input signal section are processed to determine learning patterns or test patterns. The features used in this study are frequency features. For sound signals, the magnitude characteristic is usually the output of some form of spectrum analysis technique, which in this study uses the MFCC (Mel-Frequency Cepstrum Coefficients) method. MFCC is a feature extraction that calculates the cepstral coefficient by considering human hearing [17]. MFCC values used in this study were 20 values from 0-19. The audio format used in this study is .wav.

Support Vector Machine (SVM)
The Support Vector Machine (SVM) developed by Boser, Guyon, Vapnik, and was first presented in 1992 at the Annual Workshop on Computational Learning Theory. The basic concept of SVM is data calculation techniques. By using statistics and learning with expected 24 results in the form of predictive abilities. SVM can be applied to results which is continuous, binary, categorical, logistic, and multinomial by forming a hyperplane margin [18] [19]. SVM uses kernel assistance to connect training data input to wider space dimension features and identifies its hyperplane as a dividing space [20].

Figure 4 SVM visualization
The concept of classification with SVM can be explained simply as an attempt to find the best hyperplane that functions as a separator of a two-class or multi-class in the input space [21]. Figure 4 shows some data that are members of two data class pieces, namely +1 and -1. Data incorporated in class -1 is symbolized by a circle, while data in class +1 symbolized by a square [22].

Figure 5 Hyperplane SVM margin
The best separator hyperplane (decision boundary) between the two classes can be found by measuring the margins and finding the maximum point. Margin is the distance between the hyperplane and the closest data from each class. The closest data is referred to as a support vector. The solid line in Figure 5 to the right shows the best hyperplane, which is located right in the middle of the two classes, while the circle and square data that is crossed by the margin line (dashed line) is the support-vector. Efforts to find the location of this hyperplane are the core of the training process in SVM.

Result and Discussion
The algorithm proposed in this study was created using the Python programming language. The training data used were 125 data obtained through direct recordings from two different classrooms. The training process is ready after the image pre-processing and feature extraction process is complete.
The test carried out using two existing kernels in SVM, namely a linear kernel and a polynomial kernel. The type of kernel is the parameter used to modify the best separator hyperplane in the SVM input space [23]. Choosing the right kernel function is very important because this kernel function will determine the feature space where the classifier function will be searched for. As long as the kernel function is legitimate, SVM would operate correctly, even though we didn't know what map to use [24]. In the next step, SVM would use hyperplane as a decision boundary efficiently.

Linear Kernel
The linear kernel is the most straightforward kernel function. It is used when the data analyzed is linearly separated. Linear kernels are suitable when there are many features because mapping to higher dimensional spaces cannot improve performance as in text classification. In-text classification, both the number of instances (documents) and

Polynomial Kernel
The kernel polynomial is a kernel function that is used when data is not linearly It has two parameters: c, which represents a constant term,and d, which represents the degree of the kernel.

Training Data
The training process on this system begins by entering all the image and audio data that has been prepared as training data. A total of 125 data are used as training data. After the data is inputted, proceed with the preparation process. Trained data is displayed in sequence, starting from the results of image preprocessing, which consists of the conversion of RGB images into HSV images, image filtering using the Gaussian Blur method, and image segmentation using the K-Means method. Parameters used in segmentation with k-means clustering are K = 5 and 10 iterations. After segmentation with k-means, the Hue, Saturation, and Value channel channels are separated. Post-processing consists of Otsu Thresholding, closing, erosion, and opening after the V channel has been determined. The image that has been through postprocessing is then carried out the process of extracting image features through centroid extraction by finding the coordinates of the center point (x, y) of each object. The stages of the training data image processing can be seen in Figure 6 below.  The graph in Figure 7 shows the spectrum of MFCC values generated within 5 seconds while the MFCC values in the table are presented as many as 20 values. Cepstrum, in the form of the coefficient value of the features/features of the sound signal is the result of the MFCC feature extraction method, which is to get the coefficient value as the typical value of the sound signal so that the sound signal pattern is easily recognized. The process of modeling data in the training menu is done after all training data has been entered.

Testing and Evaluation Data
The results of testing the data performed by the system can be seen in Figure 8. Test data that has been prepared through the acquisition phase will be processed to produce a classification of data which is then stored after going through an evaluation process by an expert. Tests carried out using the same 50 test data in each kernel. The use of kernels in SVM aims to classify data that cannot be classified linearly. SVM is the most well-known method with a wide range of data classes that uses the kernel to represent data and can be called a kernel-based method [25]. After testing the data, the result will be evaluated by experts in this case conducted by the teacher to see the comparison of the results of the classification carried out by the system with actual conditions. The evaluation menu interface shown in Figure 9. The test results for each kernel are presented in the confusion matrix as follows. Based on Table 1 above, the calculation results obtained are 78% accuracy, 74% precision, 89% recall, and F-measure value of 80% for testing in linear kernels. Based on Table 2 above, the results obtained are 70% accuracy calculation, 69% precision, 77% recall, and F-measure value of 73% for testing on the polynomial kernel.

Polynomial Kernel
A comparison of the results of accuracy, precision, recall, and f-measure linear and polynomial kernels can be seen more clearly in the following graph.

Figure 10
Comparison graph of the accuracy, precision, recall, and f-measure of linear and polynomial kernels Figure 10 shows that the linear kernel produces an average success rate in classifying regular and irregular classes higher than the polynomial kernel, seen from the level of accuracy, precision, recall, and f-measure. Linear kernels detect true data more than actual polynomial kernels by using the same 50 test data for each kernel because the linear kernel separates the data linearly and straight line. The same results are obtained by Supriya Pahwa with the research entitled "Comparison Of Various Kernels Of Support Vector Machine" which in his research stated that linear kernel gives the best performance an average of 88.20% correct classification compared to other types of kernel functions [26].

Conclusion
This research aims to classify the condition of the classroom whether regularly or irregularly as we can see problems that occur when the teacher is not in class, students tend to make noise.
Based on experiments which were conducted in this research, the number of conclusions can be drawn, that to obtain the information whether the class is regular or not, the image and audio data of the class conditions must go through a processing stage first. The image was processed through the stages of preprocessing, segmentation with K-Means and hair centroid extractions, which were used as features in this study. The method used for sound feature extraction in this research is MFCC. The test was carried out by using 125 training data and 50 data for each kernel, it obtained accuracy on the linear kernel of 78% and 70% polynomial kernel. It can be concluded that SVM works well in linear kernels in classifying regular and irregular classes.