Refining Content-Based Segmentation for Prediction of Coffee Bean Quality

Coffee has substantial economic value and is a key foreign exchange source for numerous nations, including Indonesia. Moreover, it is a primary livelihood for many of the country's farmers. Recently, there have been challenges in accurately predicting the quality of coffee beans, primarily due to time, inconsistency, and imprecision issues. Consequently, this study delves into the application of region-growing segmentation and content-based image retrieval (CBIR) techniques to enhance the prediction of coffee bean quality. The proposed hybrid approach, which combines region growing and CBIR methods, aims to improve the precision for forecasting cacao bean quality. Additionally, the research introduces an automated tool that employs these hybrid techniques for quality prediction. The study conducted experiments using a dataset of 400 premium and 400 low-quality coffee beans sourced from the University of Syiah Kuala in Indonesia. The results of the experiments demonstrate a commendable precision rate of 85.4%, showcasing significant improvement compared to certain previous studies.


Introduction
Coffee is a commodity with high economic value, and it plays a vital role as a source of foreign exchange for several countries, including Indonesia.Furthermore, it serves as a source of livelihood for most farmers in the country.One of the major problems affecting small-scale farmers is the low quality of products, which do not meet the standard requirements [1].Several reports also stated that selecting coffee beans based on the specified standard is important because the price depends on the quality.However, an expert is needed to determine the grade, which can be classified into good, fine, and quality.Since 1987, Coffee Bean Quality Standards have been enforced using the Triage System.It was later changed to the Defects Value System based on the decision of the ICO (International Coffee Organization) on October 1, 1983.In this method, the higher the number of defects, the lower the quality of the coffee and vice versa [2].
The most straightforward technique in determining a bean deficiency involves direct observation.Still, the decision to choose manually has several drawdowns due to subjective selection and the lack of understanding of science.In determining quality, it is essential to have the right system to analyze the problem.The Content Based Image Retrieval (CBIR) method is an effective solution to the problem of selecting coffee beans based on the standardization requirements, and it can be used to detect the texture of the product.This is often done by taking texture feature values based on the Gray Level Co-Occurrence Matrix (contrast, correlation, homogeneity, and energy) obtained from seed images.The CBIR method helps produce coffee beans that comply with the standardization requirements for defects.Several studies have been carried out, such as [3] [4] [5], which used the CBIR technique with 90 image data and a precession rate of 55.20%.In their research [3], they also use Arabica coffee bean samples in several regions in Indonesia, one of which is the coffee beans from South Kalimantan, North Sumatra, East Java, and West Java, as used in our works.Our experiments indicate that the achieved precision or accuracy stands at 85.4%, surpassing the levels observed in certain prior studies.Additionally, a separate experiment demonstrates a precision rate of 55.20%.These findings suggest that our experiments have yielded a precision rate of 85.4%, showcasing an improvement over specific earlier research endeavors.These results collectively underscore the advancements made in our study.
Three coffee bean types are used: Robusta, Arabica, and Liberika.However, in their work, we deployed Arabica coffee beans [6].Furthermore, the weakness in this study was that a small amount of image data was used, namely 30 samples from each type of coffee; hence, the level of accuracy obtained was relatively low.Meanwhile, [7] used the AHP (Analytical Hierarchy Process) method and Arabica coffee, where the outcome was quality with an accuracy rate of 85%, while the input included moisture content, bean defects, and land height.The weakness of this study was the 15% error in the form of inaccuracy between manual calculation data and the total data obtained from the AHP method.[8] used the Backpropagation Artificial Neural Network (ANN) method with 238 data, where 138 and 100 were for training and testing of the samples, respectively.The results showed the best level of accuracy of 80% with a total of 10 test data.Therefore, this study aims to examine the accuracy of the prototype content-based image retrieval system combined with the segmentation method.The results of this study are expected to contribute to the development of digital image processing science and assist interested parties in identifying bean quality, especially coffee.

Data Collection
The dataset used in this study was taken from Syiah Kuala University (https://comvis.unsyiah.ac.id/usk-coffee/), which consisted of 400 low-quality (coffee bean defect (bad) images and 400 premium-quality (good) coffee beans.Furthermore, the quality of the samples obtained was determined using the defect value.The dataset was a collection of interconnected images stored simultaneously and used to provide further information.Additionally, the system created in this study aimed to obtain similarities in coffee beans through the CBIR (Content Based Image Retrieval) and image segmentation processes.The first step involved segmenting the manually cropped image and separating it from the background.It was subjected to several resizing and grayscale processes in the preprocessing stage.The system then performed a feature extraction to obtain the characteristics of the input image, which were compared with the data stored in the training database.The process was carried out using the texture feature extraction with GLCM (Gray Level Co-occurrence Matrix).Feature vectors were then obtained in the database with specific closeness values to be retrieved.The system could distinguish between coffee's good and bad categories based on the results.

. Region Growing Segmentation Algorithm
This study used the region segmentation method to separate the desired region from the unnecessary part.According to [9] and [10], region-growing segmentation is a computer vision and image processing technique used for segmenting an image into meaningful regions or segments based on specific similarity criteria.The main idea behind region growing is to start with a seed pixel or region and iteratively expand the region by adding neighboring pixels that satisfy specific similarity conditions.The process involves the following steps: seed selection, pixel similarity criterion, region growing process, and stopping conditions.The algorithm from region growing segmentation (RGS ) can be explained in Figure 2.

Figure 2. Segmentation of Region-Growing
Furthermore, as mentioned in [11] d in [11], the region-growing technique can be outlined through several stages: i) Initially, the threshold value is determined, establishing the seed point of the image as the initial phase of region or neighbor formation) Subsequently, each region of neighboring pixels is examined to assess their similarity with the seed pixel.iii) When similarity is identified, a region is promptly generated.Conversely, if no similarity is found, the process reverts to rechecking the entire region to identify pixels that meet the criteria.iv) The growth of a region concludes when no additional pixels fulfill the inclusion criteria, leading to the creation of a new region.v) The process is then considered complete.

CBIR System Design
As introduced by [12], it aims to describe the outline of the overall system workflow.The structural algorithm CBIR is presented in the following sentences.The Content-Based Image Retrieval (CBIR) architecture is a sophisticated framework designed to enhance the efficiency and precision of image searches based on visual content characteristics.At its core, CBIR relies on a structured approach, starting with the preprocessing and indexing images within a database.Visual features such as color, texture, shape, and spatial information are extracted from each image, forming the foundation for subsequent content-based comparisons.Users interact with the system through a query interface, inputting preferences or example images to initiate searches.The CBIR system then measures the similarity between the visual features of the query and those of images in the database, often utilizing distance metrics like Euclidean distance or cosine similarity.The top-ranked images are retrieved and presented to the user.Advanced CBIR architectures may integrate machine learning for adaptive learning, support multiple modalities, address scalability and efficiency concerns, and consider security and privacy considerations.Continuous evaluation and benchmarking ensure the system's competitiveness and effectiveness in delivering visually relevant results across diverse applications and domains.The preprocessing stage was carried out to process image data and improve the quality.The first step was cropping, which helped select and trim the image of the coffee beans obtained from data acquisition.However, in this study, the cropping process was performed manually.The reason for performing the cropping process manually is to maintain quality and precision in selecting the parts to be cut or retained from an image or plant material.Manual cropping allows us to make decisions that consider visual composition, focus, and image context more accurately.Resizing was also done to equalize the sample size and speed up the calculation process.The final stage was grayscale, which involved changing the color to gray.By changing the RGB of each image pixel to the same value, each has the same value for the three-color elements, and a grayscale matrix value was obtained.

Figure 3. Preprocessing
The preprocessing stage is carried out to process image data to enhance image quality.The first process is cropping, which helps select and cut the acquired coffee bean images.However, in this study, the cropping process is done manually.Resizing is performed to make the image sizes uniform, which expedites the calculation process.The final stage is grayscale conversion, which transforms colors into shades of gray.By converting the RGB values of each image pixel into a single value, every pixel has the same value for all three-color components, resulting in a grayscale matrix value.In summary, this phase involves grayscale conversion, which alters colors into shades of gray.By transforming the RGB values of each pixel in the image into a single consistent value, uniformity is achieved across all three color channels.As a result, a grayscale matrix is obtained, representing the image in shades of gray.
Figure 3 can be explained as the initial phase of preprocessing; there are two sets of stages that the training image will go through, namely the "resize" stage, which aims to adjust the image size uniformly, and the "greyscale" stage, which functions to transform the RGB image into grayscale.Texture feature extraction: in this stage, the extraction of features is performed based on the texture of the seeds.At the same time, the database stores data resulting from the extraction of texture features.

Texture Extraction
After the preprocessing step, the next step in texture feature extraction is a pattern regularity formed from the arrangement of pixels in an image.An image is stated to have texture information when it has a pattern or characteristic in an area that appears repeatedly at intervals of a certain distance and direction [13].Moreover [14]  GLCM described the number of pixel pairs concerning the frequency at a distance d and the variation of the inclination angle θ to calculate the feature value [15].There were four static values commonly used in this matrix, namely Contrast, Correlation, Energy, and Homogeneity with angles of 0˚, 45˚, 90˚, and 135˚, [16][17].
start Image preprocessing results

Co-Occurance Matrix
Calculation of static features 0.45 to 9.135 degree co-occurance matrix The average static characteristic of the co-occurrence matrix

Figure 4. GLCM Diagram
The depiction in Figure 4 can be articulated subsequently: Following the completion of the image's preprocessing phase, the subsequent stride encompassed the extraction of GLCM texture features.After that, the primary phase in transforming the image to grayscale initiates with the matrix normalization process.Within this phase, steps are executed, including quantification and generation of the count matrix, transposition, combining the count matrix with its transposed counterpart, and culminating with the normalization procedure.
This intricate sequence of operations constitutes a pivotal stage in the overall process.Consequently, the outcomes stemming from the matrix normalization were effectively employed to compute notable attributes encompassing contrast, correlation, homogeneity, and energy.The subsequent step involved averaging the acquired values and angles, resulting in their amalgamation into a singular value.A combination of adept hardware and software configurations was employed in this research endeavor.A laptop with an Intel® Core™ i3-2330M CPU clocked at 2.20GHz, 2048MB of RAM, a 64-bit operating system, and a 128GB SSD was utilized.

Result and Discussion
This chapter explains the general workflow of the system.Some of the program functions or methods used in this study and their implementation into the simulation software have been described at the design stage in the previous chapter.The steps taken to translate the design were based on the results of the analysis in a language understood by the computer, as well as the application of the software in actual situations.In the process of looking for similarities, the images were resized and greyscaled using the function from Matlab: Img_rsz = .Furthermore, the image was returned to RGB using the Matlab function ImgRGB = cat (3 R, G, B). ). Figure 5 illustrates the following MATLAB GLCM complete codes.

•
Pixel analysis was carried out side by side based on the defined criteria.If adjacent pixels met the criteria, then the pixels were labeled as 1, while others were given 0.

•
Save the pixels labeled 1 in the output image as a region.

•
In the image f (x, y), find the pixel labeled 0 and carry out an analysis as in points 2 and 3.

•
Save the pixel labeled 0 as a region.

GLCM Feature Extraction
The sample image was converted into smaller pixels, namely 3x3 with an angle of 0˚ to facilitate manual calculations.At this stage, the first step was calculating the image matrix and normalizing the results.Furthermore, before normalizing the matrix, the working area was needed, and it was obtained from the function glcm1 = graycomatrix(Img_gray, 'Offset',[0 1]).The GLCM extraction stages can be explained in Table 1 [19].
i.The Quantitation matrix stage: the matrix results obtained were quantized with 8 gray levels ranging from 0-255, as shown in Table 1.ii.In the Count Matrix stage, the calculation was carried out in the horizontal direction; hence, the results are presented in figure 9: After the 3x3 matrix had been successfully normalized, the next step was calculating the features, as shown in Table 2. Table 2 shows the results of retrieving texture features using Matlab functions with all angles in GLCM.The next step was calculating the average of all values from these angles using a function in Matlab.Hence, a single value was obtained from each feature to ease the matching stage.To describe the performance and analyze the characteristics of textures within an image using these metrics of image processing, such as contrast, correlation, homogeneity, and energy are commonly used texture metrics.Table 3, Illustrates of Cacao images analysis.It provided values representing various metrics used to assess specific data characteristics.In contrast, the value of 0.2606 indicates the measure of contrast within the data.Contrast measures the difference between the light and dark areas in an image.Correlation, with a value of 0.9443, correlation measures the linear relationship between different parts of the data.A higher correlation suggests that the data points have a strong linear relationship.Homogeneity, the value of 0.9424, signifies how uniform the data's texture is.High homogeneity indicates that the texture is consistent throughout the data.Energy: A value of 0.2924 represents the data's energy or "smoothness."Higher energy values suggest smoother data.Precision, a value of 90%, implies that out of the items classified as positive, 90% were indeed true positives.Recall the recall value is 4%, meaning that only a small portion (4%) of actual positives were correctly identified.
Moreover Contrast, the value of 0.2465 suggests a lower level of contrast in this scenario compared to the previous one.Correlation, with a value of 0.890, the linear relationship between different parts of the data is slightly lower compared to the previous classification.Homogeneity, the value of 0.9277, indicates that the texture's uniformity is somewhat lower than the earlier assessment.Energy this time, the energy value is 0.35464, which suggests a bit more variation or "roughness" in the data compared to the previous case.The classification achieved 100% precision, implying that all items classified as positive were indeed true positives.Recall: However, the recall is 0%, indicating that no actual positive instances were correctly identified, which is puzzling given the high precision.
Based on the testing conducted on the query data, where the test was repeated 50 times, the average precision was calculated to be 85. 4.This means that, on average, about 85.4% of the positively classified instances were actually true positives.Additionally, the recall was determined to be 0.584.This indicates that a portion of actual positive samples (about 58.4%) were successfully identified through testing.These metrics collectively provide insights into the performance and characteristics of the data and the classification process used.
The table provided an analysis of two sets of seeds, categorized as "good seeds" and "bad seeds," along with their corresponding precision and recall metrics.Good seeds the precision for "good seeds" is 0.9.This means that out of the items classified as "good seeds," 90% were actually true positives.In other words, when the model classified seeds as "good," it was correct 90% of the time.Bad seeds, the precision for "bad seeds" is 1.0.This indicates that when the model classified seeds as "bad," it was correct 100% of the time.Additionally, good seeds, the recall for "good seeds" is 0.04.This suggests that out of all actual "good seeds," only 4% were correctly identified by the model.In other words, the model missed identifying 60% of the "good seeds.Bad seeds, the recall for "bad seeds" is 0.0.This means that the model did not correctly identify any of the "bad seeds" among the actual "bad seeds." The model has relatively high precision for the "good seeds," indicating that when it classifies seeds as "good," it's quite accurate.However, the recall is lower, meaning that it missed a significant portion of the actual "good seeds.For the "bad seeds," the model has both perfect precision and recall.This suggests that when it identifies seeds as "bad," it is both accurate and thorough in its identification.In practical terms, a high precision is desirable when you want to avoid false positives, as in the case of "bad seeds" where you want to be certain that the classification is correct.On the other hand, a high recall is important when you want to make sure you're capturing as many of the true positives as possible, as in the case of "good seeds," where you wouldn't want to miss potential positives.In summary, the model performs better in identifying "bad seeds" with high precision and recall.However, for "good seeds," while precision is relatively good, there is room for improvement in terms of recall to capture more of the actual positive instances.

Calculation of Euclidean Distance and CBIR
Previous research by [20] [21] [22] According to Matching is a stage in image processing, which is used to find other similar images.One parameter representing the similarity degree between the samples is the Euclidean distance.The smaller the distance between two images, the higher the similarity.The image calculation was carried out using the Euclidean formula: Where d(x-y) represents the Euclidean distance between x and y vectors in a multi-dimensional space, it quantifies the shortest path length between these two points in the multi-dimensional space.Essentially, it's a measure of how "far apart" the two vectors are from each other.Xi and yi represent the components (values) of vectors x and y along the j th dimension.Each vector has multiple components in multi-dimensional space, and this equation considers the differences between each corresponding component.Figure 10 shows an image precision graph matching the query based on the results obtained from the samples.Meanwhile, Figure 14 is a recall graph based on the test results of all query images.The data obtained were compared with the findings of previous studies, which served as references, and several studies related to the determination of quality have been carried out [23][24] [12].The table suggests that the model's optimistic predictions become more accurate as you progress from lower precision values to higher ones.The model performs very well in precision, particularly for instances with precision values of 90 and 100.Based on the precision values provided, it's clear that the model is performing well in terms of accurately identifying positive cases.This could be crucial in applications where false positives are costly or undesirable, such as medical diagnoses or fraud detection.High precision indicates that when the model predicts a positive outcome, it's likely to be correct.Though high precision is generally desirable.
However, it's important to consider whether the dataset has a class imbalance.If the positive class is significantly smaller than the negative class, achieving high precision might be more effortless because there are fewer positive instances to predict.A sudden increase in accuracy from 80 to 90 and then to 100 might indicate a change in the model, data, or problem-solving approach at those points.Investigating these changes could provide valuable insights into improving the model further.Finally, the table indicates that the model is performing well in precision.However, other metrics and factors need to be considered for a comprehensive assessment of the model's performance.Understanding the specific problem, dataset characteristics, and potential biases is crucial for drawing meaningful conclusions.

Conclusion and Future Works
Based on the comparison with previous studies, it can be concluded that this study obtained good results with a test accuracy rate of 85.4%.The implementation of the Region Growing algorithm in segmenting objects has been successfully carried out, and the system can separate objects from their backgrounds.The weakness of this study is that the system developed was not completely accurate.Hence, there were several errors during the assessment of image similarities.This was because a filter was not used to remove noise in the sample.Noise has been reported to affect segmentation, which greatly influenced the feature extraction results.This study can be continued by adding new image data types, such as Arabica, Gayo, and Liberika coffee, or by comparing fermented beans from mongooses/civets.Further studies are also advised to use a combination or comparison with other methods, such as the SVM algorithm, Artificial Neural Networks, and others, to compare their accuracy level.

Figure 6 .Figure 7 .Figure 8 .
Figure 6.RGB (Red, Green, Blue) matrix values In Figure 6, the matrix value was R (Red), with a size of 200x200 pixels.At 1.1, the value of R = 243; at 200x200, it was 231.Furthermore, the matrix G (Green) was 200x200 pixels.In pixels (1.1), the value of R = 247; at 200x200, it was 235.Furthermore, it shows the value of matrix B (Blue) with a size of 200x200 pixels.At pixels (1,1), the value of R = 246; at 200x200, it was 233.

Figure 9 .
Figure 9. Feature extraction iii.The Transpose stage changes the rows in the count matrix to columns.iv.Count + Transpose stage.At this stage, the summation between the rows and columns of the count and transpose matrix was carried out.v. Normalization Stage.A summation of all the numbers contained in the count and transpose matrix was carried out: 1 + 1 + 1 + 1 + 2 + 2 + 2 + 2 = 12.After the 3x3 matrix had been successfully normalized, the next step was calculating the features, as shown in Table2.

Figure 10 .
Figure 10.The image retrieval precision

ISSN 2088-1541 DOI : 10.24843/LKJITI.2023.v14.i02.p04 e-ISSN 2541-5832 Accredited Sinta 2 by RISTEKDIKTI Decree No. 158/E/KPT/2021 b
stated that there are several methods for extraction, such as: .The second-order feature was based on the probability of a neighbor relationship between two pixels at a certain distance and angular orientation.Furthermore, they were generally used to distinguish microstructural textures (local patterns and repeats are less pronounced).Second-order features include angular second moment, contrast, correlation, variance, inverse different moment, and entropy.c.Other texture descriptors are laws of texture measurement, wavelet, steerable pyramids, and local binary patterns (LBP).
a.The first-order feature was based on the characteristics of the image histogram.They generally distinguished macrostructural textures (periodic repetition of local patterns).The first-order features include mean, variance, skewness, quotation, and entropy.p-

Table 1 .
Gray value

Table 2 .
Results of texture feature retrieval

Table 3 .
Performance Metrics Comparison for Good and Bad Seeds