Sentiment Analysis of Merdeka Belajar-Kampus Merdeka Program on Twitter Using the Naive Bayes Algorithm

Merdeka Belajar-Kampus Merdeka (MBKM) is a learning model in higher education that provides freedom and flexibility to students. This model aims to create a culture of creativity, innovation, and non-restriction by allowing students to study for three semesters outside their study programs. The program is an initiative of the Ministry of Education and Culture of the Republic of Indonesia in response to the impact of the Covid-19 pandemic. However, the MBKM program has received varied responses from the public, especially from students. As this is a new policy, it is important to conduct analysis and evaluation of this program involving feedback from the community to enhance its performance. In this research gets an accuracy value of 78% of the model to be able to predict positive, negative, and neutral classes from the 203 testing data used or 20% of all data (1015 data)


Introduction
In the digital era, the advancement of the internet has made significant progress in the development of global technology and various aspects of communication and computer processes have undergone many changes and developments.The hope is that there will be a greater increase in the number of social networks that exist on the internet.Social networks are platforms used by users as a place to show themselves through dialogue with other users, collaborate, and solve various problems that arise in social networks.Twitter is an independent social network, where all users have the freedom to express their opinions.The data contained on Twitter has great power and can affect users in Indonesia.One of the advantages of the Twitter social networking site is that it provides an extraordinary API, allowing anyone to easily access data from Twitter [1].
Big Data refers to a set of data that is very large and complex in size, so it cannot be processed using traditional database management tools or other data processing applications.
Based on this definition, it can be concluded that Big Data has three main characteristics which are often referred to as 3V, namely volume, velocity, and variety.Volume refers to the enormous amount of data that needs to be managed.Velocity includes the speed of data processing that must keep pace with the rapid growth in the amount of data.Meanwhile, variety describes a variety of diverse data sources, including structured data and unstructured data [15].
Merdeka Belajar-Kampus Merdeka (MBKM) is a learning model in higher education that provides autonomy and flexibility to students.The aim of this model is to create a creative, innovative and non-limiting culture according to the needs of students who are given the opportunity to study for 3 (three) semesters outside their study program.This program is an initiative of the Ministry of Education and Culture of the Republic of Indonesia in response to the impact of the Covid-19 pandemic.However, the MBKM program also received positive and negative responses from the community, especially from students who felt the impact and had different views on MBKM policies.Since this is a new policy, it is important to analyze and evaluate this program in order to improve its performance through feedback from the public [2].
Twitter can be a social media platform used as a dataset in sentiment analysis.In Indonesia, this country is ranked fifth in terms of the largest number of Twitter users.In 2022, the number of Twitter users in Indonesia will reach 18.45 million.Based on this potential, an analysis of public sentiment regarding the MBKM program will be carried out using data from Twitter.In this study, data will be processed using the NLP algorithm to calculate polarity values and determine labels for each data.The final stage involves classification using the K-Nearest Neighbor (KNN) algorithm which was chosen because it has a high degree of accuracy.Determining the value of k will be based on achieving the highest level of accuracy.The purpose of this research is to gain insight into the community's response to the MBKM program.The results of the research will be used as suggestions for improvement for the Ministry of Education, Culture, Research and Technology (Kemendikbudristek) [4].
Previous research by Prasetyo, et al assessed sentiment analysis of the Merdeka Belajar policy using the Naive Bayes algorithm to classify sentiments from 180 tweet data resulting in an accuracy value of 80.55% and the majority of sentiments towards these policies were positive [2].

Research Method
The research method refers to a series of systematic steps used to collect, analyze, and interpret data in a study.The research method aims to develop a structured and logical framework, enabling researchers to answer research questions or test hypotheses that have been proposed.The following is a flowchart of the stages of the MBKM program sentiment analysis

Data Collection
In Figure 2 is the data collection stage, namely the dataset collection stage on Twitter.The data collection process is carried out by crawling using the Selenium Python library, and the crawled data is stored in csv format.

Data Preprocessing
Figure 3 is a flowchart image of the data preprocessing stage.The data preprocessing stage carried out was labeling 1015 data with comparisons (330 positive, 332 negative, and 353 neutral), removing special characters (#,$.@, etc.) in the dataset, deleting numbers, deleting punctuation marks, deleting emoticons, delete single characters, perform word tokenization, remove stopwords in words, perform steaming on say, do vectorizer (change say become binary form (0,1)), calculates the value TFIDF on each word, and store the preprocessing result dataset into.format.csv.

Sentiment Analyst
Sentiment analysis is a computational study of opinions, sentiments, and emotions expressed in text (Liu, 2011).The basic task in sentiment analysis is to classify the polarity of the text in documents, sentences or opinion pieces.Polarity means whether the text in documents, sentences, or opinions has positive or negative aspects [8].

MBKM
Merdeka Belajar-Kampus Merdeka is a policy initiated by the Minister of Education and Culture with the aim of encouraging students to have a deep understanding in various fields of knowledge, so that they are ready to face challenges in the world of work.This policy is in line with the provisions contained in Permendikbud Number 3 of 2020 which regulates National Higher Education Standards [12].

Naive Bayes Algorithm
The Naive Bayes algorithm is a classification algorithm that is widely used in data mining or text mining [9].The Naive Bayes algorithm is based on the Bayes theorem that all activities make an equal or independent contribution to the selection of a particular class.One of the classification methods to determine the description of people's perceptions in Text Mining is the Naïve Bayes method which is often called the Naive Bayes Classifier [10], [11].

…………………………………………… (1)
Information: X = Data with unknown class H = Hypothesis data X is class special P(H|X) = Probability of hypothesis H based on condition X P(H) = Probability of hypothesis H P(X|H) = Probability of the hypothesis X based on the condition HP P(X) = Probability of X Probability calculations using the naïve Bayes algorithm for sentiment analysis go through the following stages.

Evaluation Matrix
The evaluation matrix is a matrix used to measure the extent to which the model can correctly predict a label in the test data.The higher the accuracy value, the better the model can classify data correctly.In the evaluation matrix, there are 4 types of metrics that can be used to measure model performance in classifying, namely accuracy, precision, recall, and f1-score.Accuracy is an evaluation metric that measures the ability of a model to classify correctly as a whole.Recall is the success rate of the system in retrieving information.Precision is a measure of the model's ability to correctly classify a sentiment class.F1-Score is a comparison between precision and recall which is used to determine the accuracy of the data being tested.sentiment analysis were found which showed that 61.92% of the sentiment found was positive, indicating that the MBKM program was well received by Twitter users, especially students.Even though there were some negative sentiments around 38.08%.The results of this study can be a reference for the MBKM policy development team, especially the Ministry of Education and Culture's POKJA team, because this program can provide positive benefits and experiences for students.In addition, the results of this study can be used as evaluation material for future teams to make better improvements [1].Elisa Febriyani and Herny Februariyanti conducted research on the Merdeka Belajar-Kampus Merdeka (MBKM) program using the naive Bayes classifier algorithm on Twitter.The purpose of this study is to analyze public opinion sentiment towards the MBKM program on Twitter in order to assess the accuracy of the method and the percentage of sentiment as an evaluation of the algorithm, performance, and the MBKM program itself.Data collection was carried out in real-time using the vicinitas.iowebsite, focusing on tweets and retweets containing the hashtags #kampusmerdeka and #mbkm from November 2021 to March 2022.Analysis was carried out on 501 tweet data by classifying the text as either negative or positive using a naive classifier algorithm.bayes.Implementation of classification on the Naive Bayes algorithm involves several steps, including text preprocessing, TF-IDF calculations, classification calculations, and K-fold cross-validation.K-fold is used to evaluate the performance of the algorithm to achieve maximum accuracy.The program was developed using the Python programming language on the Google Colab platform provided by Google.The results of the visualization in this study are in the form of a word cloud that displays the most dominant words that appear in positive sentiments, such as campus, independence, mbkm, and programs, while negative sentiments include campus, Uang (money), Pocket (Pocket), and conversion.(conversion).Based on research findings, the classification system produces 272 opinions that are classified as positive sentiments and 229 opinions that are classified as negative sentiments, with an average accuracy of 60%, precision of 64%, recall of 58%, and f1 score of 58.% [5].
Research conducted by Abdul Rozaq, et al regarding sentiment analysis of the implementation of the Merdeka Belajar-Kampus Merdeka program using naïve Bayes, k-nearest neighbors and decision trees.Responses to the various Merdeka Belajar-Kampus Merdeka programs from the public expressed through social media varied, including positive, negative and neutral comments.The presence of these comments was able to create a growing sentiment among the general public and academics.Based on these problems, the researchers conducted a sentiment analysis of the implementation of the Merdeka Belajar-Kampus Merdeka program using comment data from Twitter.Data obtained from Twitter are classified into positive, negative, and neutral categories using the Naïve Bayes, K-Nearest Neighbors, and Decision Tree methods.A total of 475 data will be divided into two, namely training data and testing data.Data testing will cover 20% of the total data, while the remaining 80% of the total data will be used as training data.The results of the analysis show an accuracy of 99.22% for the Naive Bayes method, 96.90% for K-Nearest Neighbors, and 37.21% for the Decision Tree [6].
Irma Putri Rahayu and her team conducted research on the MBKM program using the Naïve Bayes and SVM algorithms.This study aims to analyze sentiment towards MBKM based on the existing positive and negative opinions.They focus on sentiment analysis on Twitter using the hashtag #kampusmerdeka from 2019 to 2022.The results show 86% accuracy, 87% precision and 80% recall for the Naïve Bayes algorithm.Meanwhile, the SVM algorithm with a linear kernel achieves 93% accuracy, 100% precision, and 84% recall when tested using the same dataset.[7].

Result and Discussion
Results and Discussion are an important part of a final report that presents the findings and interpretation of the results of the research or project undertaken.This section involves a description and analysis of the data collected, as well as providing conclusions based on the findings.It is important to provide an objective interpretation supported by strong evidence from the data collected.Findings and conclusions must be explained in a systematic manner and relevant to the research question or project objectives that have been set previously.

Dirty Tweet Data
In Figure 5 is a tweet resulting from the scraping process on Twitter with a total of 4900 data.The tweet data will be cleaned first and labeled with positive, negative and neutral sentiments.The labeling process on the tweet data was done manually as many as 1015 data with comparisons (330 positive, 332 negative, and 353 neutral).

Preprocessing Step 1
Preprocessing stage 1 is the process of cleaning the data, namely cleaning data from special characters, removing numbers, removing punctuation, deleting emoticons, deleting single characters, and giving sentiment labels (positive, negative, and neutral) to 1015 data that will be used for data training and data testing.means that the system can predict data that is labeled negative accurately by 86% of the total testing data that is labeled negative.In Figure 9 is a confusion matrix table which contains the results of sentiment classification carried out by the model and visualized into a confusion matrix table which contains the correct amount of positive data (TP) classified by the model, namely 47 data from 63 positive label testing data.The number of positive incorrectly classified (FP) is 15 data, the number of correct negative data (TN) is 60 data from 77 testing data with negative labels. the number of incorrectly classified negative data (FN) is 3 data, the number of correct neutral data (TNet) is 52 data from 63 testing data with neutral labels, and the number of neutral data incorrectly classified (FNet) is 27 data.

Conclusion
Based on the sentiment analysis research of the Merdeka Belajar-Kampus Merdeka program on the Twitter application using the naïve Bayes algorithm, an accuracy score of 0.78 or 78% was obtained, this means that the system is able to accurately predict 78% of the total testing data, namely 203 data or 20% of the total data (1015 data).From the results of research on positive, neutral and negative sentiments towards MBKM, the average value of the recall score for positive labeled data was 75%, the average value of the recall score for neutral labeled data was 83%, the average value of the recall score for negative label data by 78%.When viewed from the amount of the resulting percentage, more students give sentiment or are neutral towards MBKM.These results indicate that students are enthusiastic about participating in MBKM.

Figure 1 .
Figure 1.Research MethodThe picture above is a flowchart of the stages carried out in conducting sentiment analysis for the MBKM program on Twitter.In the flowchart above, the first stage is the data collecion stage (collecting datasets), then the data preprocessing stage (preparing data and cleaning the data before modeling the data), and the last stage, namely modeling, is the stage of creating a Naive Bayes model for analysis.sentiment on the dataset.

Figure 4
Figure 4. Modelling 3. Literature Study 3.1.SentimentAnalystSentiment analysis is a computational study of opinions, sentiments, and emotions expressed in text (Liu, 2011).The basic task in sentiment analysis is to classify the polarity of the text in documents, sentences or opinion pieces.Polarity means whether the text in documents, sentences, or opinions has positive or negative aspects[8].
3.3.1 Calculating TF-IDF TF-IDF (Term Frequency-Inverse Document Frequency) is a method used in natural language processing and information retrieval to measure the importance of a word in a document or collection of documents …………………….(2) Information: Tf (x, y) = Frequency of the x word in Document y dfx = Number of Documents Containing the Word x N = Total Number of Documents JURNAL ILMIAH MERPATI VOL.11, NO. 2 AUGUST 2023 p-ISSN: 2252-3006 e-ISSN: 2685-2411 Sentiment Analysis of Merdeka Belajar-Kampus Merdeka Program on Twitter Using the Naive Bayes Algorithm (I Made Teguh Arthana) ) = Probability of Class Category to j Docs j = Category Sentiment Class i (negative, neutral, positive) Example = Number of Categories Total (positive, negative, neutral) 3.3.3Calculating the Probability of Words in Categories …………………………………… (4) Information: P(Vj|Xi) = Probability of the ith word in the jth category nk = number of words x in category I (posive, negative, neutral) n = total sentiment category (i) |vocabulary| = total vocabulary 3.3.4Calculating Posterior Probability Values ……………………………….……………………….. (5) Information: Vmap i = posterior probability of category i (positive, negative, neutral) P(xi|Vj) = Probability of word i in category j P(Vj) = Prior probability of the jth category (positive, negative, neutral)

Figure 6
Figure 6.Preprocessing Step 14.3.Preprocessing Step 2Preprocessing stage 2 is the data pre-processing stage, namely preparing data before the modeling stage is carried out.In stage 2 preprocessing, namely doing word tokenization, normalizing words, removing stopwords, and stemming words.

Figure 10 .Figure 11 .Figure 12 .Figure 13 .
Figure 10.Display the ROC Curve 4.7.Visualization Wordcloud Positive, Negative, & Netral Sentiment A wordcloud in sentiment analysis is a visual representation of the most frequently occurring words in the text being analyzed, which can provide an idea of the general sentiment or feeling associated with a particular subject or topic.In the context of sentiment analysis, wordcloud is used to identify the words that appear most often in a text that are considered relevant to positive or negative sentiments.The following is a positive sentiment wordcloud JURNAL ILMIAH MERPATI VOL.11, NO. 2 AUGUST 2023 p-ISSN: 2252-3006 e-ISSN: 2685-2411 Sentiment Analysis of Merdeka Belajar-Kampus Merdeka Program on Twitter Using the Naive Bayes Algorithm (I Made Teguh Arthana)