Network Reduction Strategy and Deep Ensemble Learning for Blood Cell Detection

Identifying and characterizing blood cells are vital for diagnosing diseases and evaluating a patient's health. Blood, consisting of plasma and cells, offers valuable insights through its biochemical and ecological features. Plasma constitutes the liquid component containing water, protein, and salt, while platelets, red blood cells (RBCs), and white blood cells (WBCs) form the solid portion. Due to diverse cell characteristics and data complexity, achieving reliable and precise cell detection remains a significant challenge. This study presents a network reduction strategy and deep ensemble learning approaches to detect blood cell types based on the YOLOv8 model. Our proposed methods aim to optimize the YOLOv8 model by reducing network depth while preserving performance and leveraging deep ensemble learning to enhance model accuracy. Based on the experiments, the NRS strategy can reduce the complexity of the YOLO model by reducing the depth and width of the YOLO network while maintaining model performance by 4%, outperforming the baseline YOLOv8 model.


Introduction
Identifying blood cells is crucial because they are a population that can be easily accessed and whose shape, biochemistry, and ecology may provide clues to the diagnosis of a disease or a patient's overall health.Blood is a combination of plasma and cells circulating throughout the body.Liquid and solid components make up the two components of blood.Water, protein, and salt make up plasma, the blood's liquid portion.While Platelets, Red Blood Cells (RBCs), and White Blood Cells (WBCs) comprise the solid portion.Cells, molecules, proteins, and other components of the blood can all be measured or examined through blood tests.Doctors can diagnose illness, track illness, and choose the best course of treatment through blood testing.As a result, many blood and tissue samples are brought into medical labs, which need to be examined as thoroughly and quickly as possible.Precision diagnoses in laboratory medicine depend on being able to count specific cell populations with accuracy.Automated blood cell detection typically uses one of two methods.Traditional approaches require several processes, including preprocessing, segmentation, feature extraction, and classification.At the same time, other methods are based on deep neural networks (DNN).Due to the wide range of cell characteristics and the complexity of the data, reliable and accurate cell detection is often a challenging challenge.A particular cell, such as WBC, can be detected in a microscopic picture.
In recent years, many studies have proposed deep learning in various domains.Loh et al. [1] proposed Mask R-CNN trained on uninfected and Plasmodium falciparum-infected Red Blood Cells to detect malaria infection.Without sacrificing accuracy, the predictive model generated results at a rate that was 15 times faster than human counting.Vogado et al. [2] proposed a Fine-Tuned and Highly Generalisable Deep Learning Model for leukemia diagnosis.Hakim et al. [3] applied YOLO for embryo grading after in vitro fertilization.In other domains, deep learning is widely proposed in many studies-Indrawan et al. [4] proposed optimization on the CNN model to detect fruit freshness.Surya et al. proposed a deep learning-based method to recognize Balinese carving [5]- [8], vehicle detection [9], [10], and rice disease [11].
Deep learning has been widely applied in the medical domain to help identify various diseases.Devi and Kumar [12] proposed deep transfer learning to identify diabetic retinopathy using a data augmentation strategy.The study produces 91% accuracy on synthetic data, with a 5% improvement compared to non-synthetic data.Rahmat et al. [13] proposed a deep convolutional neural network with k-means segmentation to identify glioma based on MRI images.The study produces 95.5% accuracy in classifying glioma cell type.Xu et al. [14] proposed the ISANET model that combined CNN and attention mechanisms for non-small cell lung cancer classification and detection.The attention mechanism is applied to the InceptionV3 model to classify three categories of lung cancer that outperform the baseline CNN model with an accuracy of 98.14%.Another study presented a performance comparison of an artificial neural network (ANN) and support vector machine (SVM) on the detection of cervical abnormalities based on CT images [15].The study shows that ANN outperformed the SVM model with an accuracy of 95.75%.
YOLO is a single-stage object detector combining all the object detection pipeline elements into a single neural network.It uses features from the entire image to predict class probabilities and bounding box coordinates.Many studies have proposed object detection using YOLO variants.Aly et al. [16] proposed breast mass detection and classification based on YOLOv3.The study can detect most of the challenging cases of masses and classify them correctly by applying kmeans clustering on the dataset.In other medical objects, YOLO variants have been implemented in the automated detection of COVID-19 cases [17] and melanoma lesion detection and segmentation [18].In other domains, YOLO variants have been widely applied in traffic sign recognition [19], small target detection in driving scenarios [20], stomata detection [21], fish detection, and tracking fish in farms [22].The previous studies show that YOLO can be applied well in various domains.
Ensemble learning strategies can improve a model's performance by combining several models.This strategy produces more excellent prediction performance compared to using only a single model-Surya et al. [23] proposed a network scaling strategy on YOLOv5 for the Balinese carving motif detection method.The network scaling strategy could improve YOLOv5 model performance by 98% on AP50 based on experimental results.Nanni et al. [24] proposed an ensemble strategy for bioimage classification by composing multiple CNNs into an ensemble and combining scores by sum rule.The studies achieved an accuracy of 97%.This study [25] proposed Handwritten Digit Detection and Recognition ensemble models.Another study suggested an ensemble strategy to improve the Mask R-CNN model for polyp detection [26].
Object identification, classification, and segmentation are just a few computer vision applications that have significantly benefited from deep learning.The rapid development of deep learning networks-dependent studies in various modalities and applications, including those indicated for autonomous cell counting, have demonstrated the effectiveness and efficiency of deep learning in medical imaging.Several studies proposed different types of leukocytes using YOLOv2 and an optimized bag of features [27], detection of white blood cells using YOLOv3 network [28], erythrocytes detection based on total contribution score and fuzzy entropy [29], and blood cells detection in blood smear image using improved Faster R-CNN [30].Blood cell analysis is a critical task to diagnose various health issues.Alzubaidi et al. [31] proposed lightweight deep-learning models that classify the erythrocytes in sickle cell anemia diagnosis.Parab and Mehendale [32] proposed Red Blood Cell classification based on feature extraction of each segmented cell image.The study could detect RBC with accurate and fast results.Alam and Islam [33] proposed YOLO, a deep-learning model for blood cell identification.
In this study, we proposed a network reduction strategy and deep ensemble learning approaches to detect blood cell types.Blood cell detection is critical to diagnose abnormalities in blood cells that are accountable for various health issues.The proposed approaches aim to produce a lighter YOLOv8 model by reducing its network depth while maintaining its performance and utilizing deep ensemble learning to improve its performance.The network reduction strategy can reduce the computational complexity of the YOLOv8 model by reducing the network depth, which results in a lighter YOLOv8 model.In addition, deep ensemble learning can improve model performance by combining the weights in each model variant, eliminating detections with low confidence, and improving detection performance in the YOLOv8 model.The paper's organization is given as follows: the literature review of related studies is given in Section 2. The explanation of the proposed method is given in Section 3. The results and discussion are provided in Section 4, and the conclusion of this study is given in Section 5.

Research Methods
The following section presents our research methodology and model evaluation.The method consists of dataset preparation, network reduction strategy, deep ensemble learning, and performance evaluation.

Dataset Preparation
This study proposed using the publicly available Blood Cell Dataset on Kaggle [34].There are 364 images across three classes: White Blood Cells (WBC), Red Blood Cells (RBC), and Platelets.There are 4,888 labels across three classes.Fig. 1 shows the dataset sample.The dataset contains images and labels.The labels annotated the blood cell type in rectangular annotation.This annotation aims to show the location of blood cells and their type.The Blood Cell Dataset is a small-scale dataset for object detection.The image resolution is 416×416 pixels.In this study, we split the dataset into three parts, i.e., training, validation, and testing, by 70:20:10.Red Blood Cells (RBC), and Platelets [34].
Each class of blood cell types is distributed throughout the dataset.The challenge in detecting blood cell types is that the distribution of data in each class is uneven, so an optimization strategy is required to improve the performance of the detection model.The Red Blood Cell data class dominates by exceeding 8,000 images, while the number of White Blood Cell and Platelets images is just under 2,000.
Figure 2. Dataset Distribution Figure 2 shows the class distribution for each type of blood cell.RBC dominates the blood cell composition, with labels exceeding 8,000, while WBC and Platelets only have 1,000 labels.The imbalanced data is a challenge in detecting blood cell types, so it requires an optimal model.

Network Reduction Strategy (NRS)
You Only Look Once (YOLO) is a deep learning model for detecting bounding boxes and their object classes in an end-to-end network.In this study, we implemented the Network Depth Reduction strategy on the YOLOv8 network.The Network Reduction strategy is fine-tuning the YOLOv8 network to produce a lighter model.This strategy is implemented by reducing the network depth in the YOLO model.Table 1 shows the network reduction strategy in the YOLOv8 model.

Table 1. Network Reduction Strategy on YOLOv8
Model Network Depth YOLOv8-35 35% YOLOv8-60 60% YOLOv8-Baseline 100% The Network Reduction Strategy applied four variations of network depth, i.e., YOLOv8-35, YOLOv8-60, and YOLOv8-baseline, with network depths of 35%, 60%, and 100%, respectively.This strategy aims to produce a lighter YOLOv8 model by reducing its network depth while maintaining its performance.We modified the YOLOv8 configuration file to reduce the network depth.For the YOLOv8-35, we changed the network depth, width, and max channels to 0.35, 0.25, and 1024 respectively.For the YOLOv8-60, we modified the depth, width, and max channels into 0.60, 0.75, and 768, respectively.In addition, the YOLOv8-Baseline is the benchmark model that uses 100% of network depth.Fig. 3 shows the YOLOv8 network architecture for blood cell detection.

Deep Ensemble Learning
A nonlinear approach that offers greater flexibility is a neural network.Due to neural network learning using a stochastic training process, this flexibility has a drawback in that it is susceptible to the particulars of the training data.When a neural network discovers a different set of weights, it leads to different predictions in the training process.Therefore, integrating the predictions from multiple models rather than one could lower the variance and produce better predictions.We proposed deep ensemble models based on NRS on YOLOv8.The ensemble model combines the prediction on each model to produce a better final prediction.Fig. 5 shows the proposed deep ensemble learning to predict the bounding box and object classes on blood cell images.The ensemble model combines several model predictions.In this study, we combine three models, i.e., YOLOv8-100, YOLOv8-60, and YOLOv8-35.The ensemble model is defined using the following formula: The ensemble model calculates the prediction of several models.Based on Formula (1),  is the model weight based on the Network Reduction Strategy on the YOLO model, and L is the variation of the images dataset.

Model Performance Evaluation
We evaluated the model's performance based on Precision, Recall, and PASCAL VOC Standard metric.The VOC standard metric calculates the average precision (AP) at intersection over union (IoU) 0.5.The following formulas define precision, recall, and VOC standard metric:

Result and Discussion
In this study, we conducted experiments on a local machine with a 12GB RTX 3060 GPU with CUDA 11.3 and CUDNN 8.2.1 version.Based on the experimental results, we conducted several test scenarios to evaluate the performance of each model.Each scenario applied NRS and Deep Ensemble Learning strategies to the YOLO variant.
The testing phase was performed on the detection model by applying the Network Reduction Strategy to the YOLOv8 model.Each model is evaluated based on Precision, Recall, and Average Precision at an IoU threshold of 50%.This evaluation metric uses the VOC Standard Metric for object detection.Figure 6 shows the Precision-Recall and Average Precision graphs for three YOLOv8 models trained at the previous stage.
The model evaluation is performed to detect the type of blood cells in each blood cell image in the test data.Figure 6 shows the Network Reduction Strategy's precision-recall and average precision matrices on three models.The YOLOv8-35 model shows better test results compared to the baseline YOLOv8 model.

Conclusion
The results demonstrated in the previous section show that our proposed model can outperform the YOLOv8-baseline model and the YOLOv4 variant.

Figure 3 .
Figure 3. YOLOv8 Network Architecture Model training for each optimization strategy uses the same hyperparameter configuration so that the performance comparison at the model evaluation stage is fair.At the initial training, the initial learning rate (lr0) is 0.01.During the training process until the end of model training, the final learning rate (lrf) = 0.2.Model training uses momentum = 0.937 weight decay = 0.0005.In the warmup_epoch, momentum, and bias_lr are intended to increase model changes slowly to avoid optimization instability that can cause divergence and nan values.The warmup_epoch, warmup_momentum, and bias_lr are the initial momentum and bias values leading to the default values during the warmup period.Figure 4 shows the detailed YOLOv8 network.
Figure 4  shows the detailed YOLOv8 network.

Figure 5 .
Figure 5. Proposed Ensemble Model on YOLOv8 Based on Network Reduction Strategy

Figure 6 .
Figure 6.Precision-Recall and Average Precision Matrices of Network Reduction Strategy: (a) YOLOv8-35, (b) YOLOv8-60, (c) YOLOv8 NRS and Deep Ensemble Learning optimization strategies can improve YOLO detection performance for detecting blood cell types.The NRS strategy can reduce the complexity of the YOLO model by reducing the depth and width of the YOLO network while maintaining model performance.In addition, the Deep Ensemble Learning strategy can improve model performance by eliminating low detection confidence, thereby optimizing detection results by 4% based on the experimental results.RISTEKDIKTI Decree No. 158/E/KPT/2021 Table2shows the evaluation results of three detection models.

Table 2
shows the experimental results compared to the benchmark model, i.e., YOLOv4 variants, YOLOv5 variants, and YOLOv8 baseline.We trained each model on our dataset to fairly compare the model detection performance.Our proposed model achieved Average Precision on threshold 50 (AP50) by 97.2% based on the experiments.Compared to the baseline YOLOv8 and YOLOv4 variants, our proposed model performed better AP on our imbalanced dataset.Based on the experimental results, our proposed model achieved better results compared to several benchmark YOLO variants.