Sentiment Analysis of Indonesian YouTube Reviews About Lesbian, Gay, Bisexual, and Transgender (LGBT) using IndoBERT Fine Tuning

Lesbian, gay, Bisexual, and Transgender (LGBT) is an individual who has a sexual orientation or gender identity that is different from the heterosexual majority. The LGBT community now dares to appear openly on social media; nowadays, social media is used as a source of information and a place to provide comments. The Indonesian state generally still views the LGBT community as deviant behavior. This research was conducted to understand Indonesian society's views on LGBT through YouTube and social media. The text mining method analyzes and classifies the counter or pro sentences expressed in the comments. The model used in this research is IndoBERT because the research object studied is Indonesian. IndoBERT is part of the Bidirectional Encoder Representation From Transformers (BERT) model. The data sources used were 1,493 data. The stages carried out in this research included the preprocessing stage, which included case folding, data cleaning, tokenization, stopword removal, stemming, and normalization, then the data labeling stage, and finally, the model building stage with IndoBERT Fine Tuning. The level of accuracy achieved using IndoBERT is 74%.


Introduction
The growth of LGBT people in Indonesia has attracted the attention of many parties and has become a hot topic of conversation in society, primarily through social media.Lesbian, gay, Bisexual, and Transgender (LGBT) has become an issue that attracts public attention because they are increasingly daring to appear openly, primarily through promotional activities and advertisements spread on social media.On social media, the LGBT movement has generated various comments, both supportive and opposing [1].In Indonesia, in general, LGBT is still seen by some people as deviant behavior because it is considered to deviate from conventional sexual orientation.The debate regarding LGBT covers different points of view; where some people hope that LGBT existence is respected as part of human rights, while others consider this behavior to be deviant and sinful [2].Social media, especially the YouTube platform, is the primary source of information and where many public comments and opinions are expressed [3].The comments column on YouTube is a forum for users to respond to uploaded content.Sentiment analysis of comments on YouTube can be an alternative for understanding public opinion and sentiment towards content [4].
Research focusing on Sentiment Analysis of Cyberbullying on Twitter produced an accuracy of 81% with an f1 score of 80.67% [5].The study examines the object of Sentiment Analysis of English-language Film Reviews using the Bidirectional Encoder Representations from Transformers Approach.Dataset collection was carried out by downloading data from the Cornelledu website, where the data used was the Movie Review Polarity Dataset v2.0 data.There are 2000 film reviews, of which 1000 are positive, and the remaining 1000 data are negative reviews.BERT-base is proven to be used not only for short sentences to be classified.Even though only the final 128 characters were taken from the total characters in the document, BERTbase could still carry out training well so that the accuracy results were relatively high, namely 73.7% [6] [7].
Several studies have researched sentiment analysis or the implementation of sentiment analysis using various methods.The BERT algorithm has slightly better classification performance than the random forest algorithm; simulation results show that the random forest algorithm has significantly lower computing time compared to the BERT algorithm in the sentiment analysis of Twitter users [8]; the following is some previous research related to the study conducted sentiment analysis on Twitter social media using Naïve Bayes.The method used is this method with 1,579 data taken from Twitter.The accuracy level is 95%, with a positive polarity value of 77% and a negative polarity of 150.2%.To confirm the results of this research, a K-fold cross-validation test was also carried out with k of 10, which resulted in an accuracy value of 87.58% [9].
Research was conducted on implementing the SVM algorithm for sentiment analysis regarding LGBT reviews.The method used in this research is the support vector machine method.The classification process uses three kernels: linear, polynomial, and Radical Basic Function (RBF).The results obtained by the SVM accuracy process produced 74% accuracy in the linear kernel with experimental data of 90%: 10% and 74% and in the RBF kernel with C-100 gamma=0.01 [10].
Research was conducted regarding sentiment analysis of YouTube video comments using support vector machines.In this research, the support vector machine method was used, and classification was carried out on Indonesian language YouTube comments with the themes of daily vlogs, culinary videos, unboxing, hack/DIY videos, and music cover videos.The results that have been carried out using the support vector machine algorithm, which uses tf-idftf-IDF feature extraction and also uses a polynomial kernel, have an accuracy of 86%, the precision of 87%, recall of 99%, and f1-score of 100% [11].
Other research related to sentiment analysis regarding the increase in prices of necessities on YouTube and social media using the SVM algorithm.This research uses a support vector machine to classify positive and negative comment labels and uses a technique for balancing the number of labels in the data, namely SMOTE.During the testing of the four SVM classification models, linear SVM with SMOTE achieved the highest accuracy with 86.33%, precision 75%, recall 66.67%, and f1-score 70.59%.Testing was also carried out on validation data (new test data outside of the dataset), resulting in an accuracy value of 72% [12].
The Sentiment Analysis of Electronic System Operator (PSE) Policies research used the Bidirectional Encoder Representations from Transformers (BERT) Algorithm.This research used a dataset of 5,016 and used hyperparameters.The batch size was 16, and the epoch was 5, resulting in accuracy levels of 69%, 55%, and 55% [13].One way to manage hate speech on social media is to classify a sentence to determine whether the sentence is hate speech or an average sentence.The model created for classification uses the feedforward neural network method with IndoBERT.Based on the test results, the model developed using the feedforward neural network method with IndoBERT provides the best accuracy of 89.52% [14] The novelty of this research lies in producing a sentiment analysis of Indonesian YouTube reviews regarding LGBT using IndoBert.Thus, this study is expected to contribute to developing sentiment analysis classification modeling using IndoBert.

Research Methods
The CRISP-DM (Cross-Industry Standard Process for Data Mining) model provides an overview of the lifecycle of a data mining project.CRISP-DM consists of 6 stages: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment, as shown in Figure 1.The research steps using the CRISP-DM Method are shown in Figure 1.In the first stage, business understanding in CRISP-DM aims to understand the purpose of creating the program that will be used.In the second stage, data understanding is used to analyze the data using a dataset from Kaggle.com [15].In the third stage, the aim is to get good data so that it will produce good results, too [16].Data preparation is carried out to clean and prepare the data.In the fourth stage, there is a modeling stage where the data will be processed using the algorithm BERT, which will go through a fine-tuning process by adjusting hyperparameters when the training stage begins.The fine-tuning process is done to find a configuration that works on the dataset [17][18].That will be used, and in the fifth stage, evaluation is a model evaluation to determine the accuracy of the results of the model used.In the final stage of deployment, the model used will be implemented.This method is used in classification models to visualize and calculate the number of correct and incorrect predictions made by the model [19].CRISP-DMbased model for data mining in the process of predicting assembly cycle time.The developed solution has been evaluated using accuracy.Industrial data will be a part of the methodology that allows the assembly estimate.Time of a finished product at the quotation stage [20].

Business Understanding
At the business understanding stage of this program, the aim is to classify sentences related to LGBT issues so that the public can determine public opinion and analyze sentences classified as being in the contra (positive) or pro (negative) category toward LGBT.

Data Understanding
In the data understanding stage, the research used a dataset from Kaggle.com, accessed on May 30, 2023.This dataset contains YouTube comments about LGBT.It has two attributes: "comment," which includes comments, and "sentiment," which is the label of the comment sentence.A total of 1,493 data points were used, with a data split of 80% for training data, 10 % for validation, and 10% for testing data, as shown in Figure 2.

Data Preparation
The aim of the data preparation stage is to obtain good data so that it will produce good results.To do this, the preprocessing stage is carried out.The case folding, data cleansing, tokenizing, stopword, and stemming processes will be carried out at this stage.The preprocessing stage is shown in Figure 4 below.

Case Folding
This stage is the process of changing uppercase letters to lowercase.Initial sentence to case folding result as shown in Table 1

Data Cleansing
The data cleansing stage removes attributes in sentences, such as emojis, hashtags, numbers, punctuation marks, and words that do not provide information.Table 2 shows the case folding result to the data cleaning result.This part presents results or findings.In the table, the results of the case folding process will be used to process data cleansing.This process will remove attributes in sentences and delete emojis, hashtags, numbers, punctuation marks, and excessive spaces in sentences; the example sentences in the table above remove punctuation marks and emojis.

Tokenizing
At this tokenizing stage, the text is divided into sentences and word-by-word units.Data cleaning results in tokenizing, as shown in Table 3.The table's data cleansing results will be processed to the tokenizing stage by breaking sentences into word units.

Stopword and Stemming
The stopword stage will delete words that are not needed, and the stemming stage will change the words into their basic form.This stemming process uses an Indonesian literary library-the tokenizing results to stopword and Stemming results as shown in Table 4.In the table above, the results of tokenizing will carry out a combination of stopword and stemming processes.The table above will remove unnecessary words and change them into their basic form.

Normalization
In the normalization stage, the dataset containing non-standard words is converted into standard words or according to spelling.This is done because quite a lot of sentences use non-standard words such as: "enggak", "ngga", and "ga".If the word does not go through the normalization stage, then the system will consider the words "ga", "enggak", "ngga", and "gak" are different words, even though these words have the same meaning, namely "tidak".as showed in Figure 5.

Lexicon Based Labeling
Lexicon Based Labeling is a method of sentiment analysis where the process of determining sentiment is carried out through an approach where the words in the text are analyzed based on the sentiment value associated with those words.A lexicon is a reference to determine whether a word or sentence has a positive, neutral, or negative sentiment.If a text has a polarity value >0, then it is classified as text with a positive sentiment; if a text has a polarity value of =0, then it is classified as a text with a neutral sentiment; and if a text has a polarity value of <0, then it is classified as a text with a negative sentiment.

Modelling
The BERT implementation stage consists of pre-processing, labeling, dataset splitting, sentiment classification, pre-training, fine-tuning, training, and evaluation.BERT implementation consists of data preparation and class initialization.The data Preparation dataset must be adjusted to the input received by BERT before training on BERT.BertTokenizer is needed to tokenize sentences and generate appropriate input.This is done because BERT uses a specific vocabulary, depending on the model used.A tokenizer compiles sentences representing the input in BERT at the load model stage.Next, perform class initialization on the sentiment dataset document and sentiment data loader document on the class initialization stage.When the data training process is carried out, the data set will be divided into three categories: training, validation, and testing.The aim of dividing the dataset into three categories is so that machine learning knows the purpose of each dataset.The stage for determining the optimizer used is as follows,

Evaluation
This evaluation stage will evaluate the model created to determine to what extent the model can be classified.This evaluation uses a confusion matrix evaluation.This method is used in classification models to visualize and calculate the number of correct and incorrect predictions made by the model.Figure 6 below is the result of the resulting confusion matrix.The confusion matrix diagram is used to measure machine learning classification.At this stage, we discuss the results of two comparisons between validation data and testing data.The following are two comparisons between validation data and testing data.
1. Validation Data This stage will discuss the validation results to show the confusion matrix by measuring how many models succeeded in predicting sentiment from the validation data.The results of the validation data confusion matrix can be seen in Figure 8. From the results of the validation confusion matrix in Figure 9, it can be seen that as many as 297 are in the True Positive (TP)  The results of the F-1 score accuracy value from the validation data reached 71%, the macro AVG value from precision reached 60%, and the macro AVG value from the recall value reached 60%, as shown in Table 5.The macro average value of the F-1 score reaches 60%.After completing trials on validation data, the next step is to carry out trials testing data.

Data Testing
The final stage will cover the testing results to demonstrate the confusion matrix by measuring the number of models that successfully predict sentiment based on testing data.Thus, it can be seen that the system can classify positive, neutral, and negative comments correctly.The confusion matrix is used to measure the performance of machine learning classification.The results of the confusion matrix for the testing data can be seen in Table 6.Precision aims to calculate the percentage of inputs detected by the system.The precision value in the validation results is 83%.Recall seeks to calculate the percentage of inputs correctly identified by the system.The Recall value obtained in the validation is 79%.The accuracy of the validation test is 71%, which indicates the percentage of inputs correctly predicted by the neural network.F1-Score is the average obtained from precision and recall.The F1-Score in the validation testing results is 81%, which can be seen in Figure 9 and Figure 10.The precision value in the testing results is 86%.Recall in testing is 78%, and Accuracy in the testing phase has a value of 74%.This value of 74% is due to the limited and less complex training data used.If the training data used becomes more complicated, it will result in an increased accuracy value.The F1-Score is 82%, as can be seen in Table 7.
Compared to previous research using the BERT algorithm, the F1-Score value in this research is higher, precisely 82%.The F1 score is 82%, which means that the majority of Indonesian people do not support LGBT.The neutral classification results perform significantly less than other labels in validation and testing.This is because the training data needs to sufficiently represent the variations within the neutral category, resulting in the model being poorly trained to recognize patterns related to this category.

Figure 1 .
Figure 1.Research flow using the CRISP-DM method

Figure 2 .
Figure 2. Attributes in the Dataset

Figure 6 .
Figure 6.Implementation IndoBERT BERT will go through a fine-tuning process by adjusting hyperparameters when the training stage begins.The fine-tuning process is done to find a configuration that works on the dataset.Below are the hyperparameters used in the training process: 1.The batch size is the number of samples fed into the network before the weights are adjusted.The larger the batch size, the longer it takes to complete one batch.2. The epoch is the number of times the network looks at the entire data set.An epoch occurs when all instances have passed through the network, either forward pass or backward pass.3. Learning level is one of the training parameters for calculating the weight correction value during the training process.The greater the learning level value, the faster the training process will run.Following are the hyperparameters recommended for fine-tuning that indicate optimal performance for specific tasks, namely: 1. Batch Size: 32 2. Level of learning: 2e-5 3. Epoch: 10

Figure 7 .
Figure 7.The Stage for determining the optimizer Figure 7 shows that the optimizer is used to optimize functions on the dataset.The optimizer used is Adam, with a learning rate of 2e-5.The number of epochs used is 10.The epoch is the number of times the network can view the entire dataset.Training is carried out with two datasets: the training data set and the validation dataset, as shown in Figure 7.

Figure 8 .
Figure 8. Training History In Figure 8 above, the accuracy of the training data model is better than that of the validation data model.The learning curve shows increased accuracy results obtained during training data.In the validation model, the accuracy results are high but tend to decrease during the process.After training through the fine-tuning stage, the dataset is tested on sample sentences to prove that the predictions are correct.
comments that are positive with the predicted results showing true positive, 22 data are included in the True Neutral (TNt) category, namely comments that have neutral value with the predicted results showing that they are truly neutral.One hundred data points are included in the True Negative (TN) category: comments with a negative value and the predicted results showing a true negative.

Figure 9 .
Figure 9. Confusion Matrix Diagram for Validation Data

Figure 10 shows
that 144 data are included in the True Positive (TP) category, namely positive comments, with the predicted results showing true positive.13 data are included in the True Neutral (TNt) category, namely comments with a neutral value, with prediction results showing true neutral.Sixty data are included in the True Negative (TN) category, namely negative remarks, with the predicted results showing true negatives.

Figure 10 .
Figure 10.Confusion Matrix Diagram for Testing Data

Table 5 .
Results or Validation Data Accuracy Report