Mobile QR Code QR CODE

2025

Reject Ratio

81.5%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 15, No. 3, p.360-371

ISSN (online) :

2287-5255

Received : 3 February 2025Revised : 25 March 2025Accepted : 4 April 2025

DOI :

10.5573/IEIESPC.2026.15.3.360

Regular Paper

Res2U-Net: Double Resnet on U-Net for Exudate Segmentation in Retinal Image

(Anita Desiani) ^1,^* (Bambang Suprihatin) ¹ (Muhammad Suedarmin) ¹ (Siti Rusdiana Puspa Dewi) ² (Akmal Junaidi) ³ (Muhammad Arhami) ⁴

(Department of Mathematics, Universitas Sriwijaya, Indralaya, Indonesia. {anita_desiani, bambangs}@unsri.ac.id, muhammadsuedarmin@gmail.com)
(Departement of Medicine, Universitas Sriwijaya, Indralaya, Indonesia. sitirusdiana@fk.unsri.ac.id)
(Department of Informatics Engineering, Universitas Lampung, Bandar Lampung, Indonesia. akmal.junaidi@fmipa.unila.ac.id)
(Department of Informatics Engineering, Politeknik Negeri Lhokseumawe, Aceh, Indonesia. muhammad.arhami@pnl.ac.id)

^*Corresponding Author : Anita Desiani, anita_desiani@unsri.ac.id

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

The appearance of exudate on retina indicates diabetic retinopathy. An accurate segmentation method is needed to detect the presence of exudates, both hard exudates and soft exudates. U-shaped network (UNet) is a segmentation architecture. The varied forms of exudates require architecture with deep layers. However, adding layers to UNet can result in vanishing gradients during training. The study modifies the UNet architecture by replacing the encoder and decoder of UNet with residual blocks for exudate segmentation. The architecture is named a double residual block on a U-shaped network (RES2U-Net). The residual block allows gradients to flow directly across several layers without having to deal with non-linear operations. It can overcome vanishing gradients in UNet. The proposed architecture is expected to maintain the flow of important information and handle vanishing gradients in each layer so the exudate segmentation process in retinal images is optimal for both hard and soft exudates. The application of RES2U-Net to exudate segmentation produces an accuracy above 95%. The F1-Score results above 0.80 show that the proposed architecture has a good balance in separating exudate from unnecessary features. These results show that the proposed architecture can provide accurate and valid exudate segmentation results on retinal images.

Keywords

Diabetes, Exudate, Health research, Health risk, Segmentation

1. Introduction

The retina is the tissue layer behind the eyeball. The retina functions as a bridge that converts light from the eye into signals that are sent to the brain to form vision ^[1]. One of the disorders that occurs in the retina is diabetic retinopathy. Diabetic retinopathy is a complication of diabetes mellitus which can cause blindness ^[2]. Diabetic retinopathy is characterized by the presence of exudates on the retina called exudates. Exudates can be described as irregularly shaped yellow spots that appear on the retina ^[3]. The exudate usually has sharp boundaries and is brighter than the retinal background. Exudates are divided into two forms, namely soft exudates and hard exudates. Early detection of this disease is essential for preventing vision loss. Identification of exudates in retinal images can be done by carrying out a segmentation process.

Segmentation is employed to distinguish between several objects under study as foreground and unstudied objects as background. Manual segmentation requires high precision, requires special skills, and requires a long time with a large amount of data. Automatic segmentation is to overcome the limitations of manual segmentation ^[4, ^5]. The application of Convolutional Neural Network (CNN) can help medical personnel segment images quickly and accurately ^[6]. CNN can detect and recognize objects from digital images ^[7]. A U-shaped network (UNet) is a standard CNN architecture for image segmentation ^[8]. The study by Xu et al. ^[9] applied UNet for hard exudate segmentation and produced an F1-Score above 85%. However, sensitivity is still below 80%. The study by ^[10] implemented UNet architecture for exudate segmentation but the accuracy and specificity are still below 85%. The study by ^[11] applied UNet architecture only for hard exudate segmentation and produced sensitivity and F1-Score below 80%. UNet requires a deep network to be able to learn detailed features in images ^[12]. Deep networks in UNet occur with the addition of layers consisting of the addition of non-linear activation functions, convolution layers, and downsampling. The deep layers make the Encoder UNet is difficult to update the training weights. The model’s ability may decrease to capture important features of the image at deeper layers due to repeated downsampling processes. The inner layers in UNet also affect the performance of the decoder section. The process of returning the image to its original size through an up-sampling operation on the decoder is incomplete due to non-optimal feature representation on the encoder section. Non-optimal feature representation can result in a vanishing gradient. The vanishing gradient can hinder the learning process of the model to learn complex features, thereby impacting overall segmentation performance ^[13].

One technique to overcome the UNet problem is to combine different architectures. A Residual Network (ResNet) has a structure similar to UNet but is equipped with skip connections that occur in the residual block. ResNet contains residual blocks that are skipped by the original information on the skip connection path ^[14]. A path that permits data to flow straight from the input to the next layer is called a skip connection ^[15]. Skip connections help keep the original information intact and allow gradients to more efficiently pass through the layers of the network during training ^[16]. Residual blocks can help prevent vanishing gradients during training. The application of Residual block to segmentation has been carried out in several studies. Applying residual block for segmentation involves adding several convolution layers at the end of the network to convert the output into a segmented image. Several studies have modified the UNet architecture with residual blocks for exudate segmentation in retinal images. The study by ^[17] applied Residual blocks on encoder UNet for exudate segmentation. This study produces an accuracy above 50% without measuring other performance. The study by ^[18] applied residual blocks on the decoder UNet for hard exudates segmentation. This study only produces accuracy and specificity above 85%, but the sensitivity is still below 80%. The study by ^[19] performed exudate segmentation using modified UNet with residual blocks and produced accuracy above 85% but did not measure other performance.

This study proposes a modified UNet architecture by replacing the encoder and decoder of UNet with residual blocks in ResNet or double residual blocks on a U-shaped network (RES2U-Net) for exudate segmentation in retinal images. The RES2U-Net architecture is a modified UNet architecture with residual blocks. The residual blocks are not only used to modify the UNet encoder section but also to modify the UNet decoder sections. The UNet Encoder section is responsible for extracting and compressing features from the image using repeated convolutions and downsampling. Repeated convolutions and downsampling can result in the loss of feature details, especially low-level features. In addition, the deeper the layer, the more non-linear activations that cause a vanishing gradient. The Residual block has a skip connection that provides a direct path to send low-level features from the previous layer to the next layer so the feature details can be preserved and can overcome the occurrence of vanishing gradient because it does not need to undergo non-linear transformations at each layer. The UNet decoder section performs upsampling to restore segmented features to the original image size. In the upsampling process, sometimes some fine details are lost when encoded. The use of residual blocks in the decoder section can help in recovering details lost during the encoder process. In the upsampling process, sometimes some fine details are lost when encoded. Residual blocks allow the decoder to learn lost information. Residual blocks can call features that have been missed using skip connections. Skip connections ensure that the initial features are carried over to the last layer. This study still maintains the UNet bridge section. The bridge of the UNet is to help preserve significant data and guarantee a strong correlation between the features extracted by the encoder and sent to the decoder. The exudates segmented in this study are not only hard exudates but also soft exudates. The proposed architecture is expected to be an accurate architecture for exudate segmentation in retinal images. Segmentation results are evaluated based on accuracy, sensitivity, specificity, precision, F1-Score, Geometric Average (G-Mean), Intersection over Union (IoU), and Area Under Curve (AUC) to assess the proposed architecture’s exudate segmentation effectiveness.

2. Method

This study uses retinal image data obtained from the Indian Diabetic Retinopathy Image Dataset (IDRiD) ^[20]. The stages in this study consist of stages pre-processing and segmentation stage. The pre-processing stage is carried out by taking the Green Channel and applying Contrast Stretching. The segmentation stage was the stage of separating objects in the image using the architecture proposed in this study. The segmentation stage consisted of training and testing processes. The overall stages in this study can be seen in Fig. 1

Fig. 1. Illustration of the study stages in IDRID dataset segmentation.

2.1. Pre-processing

Image quality can be improved at the pre-processing stage. Pre-processing is carried out before feature extraction from the image. The segmentation stage used preprocessed images. The purpose of pre-processing is to get the image ready for feature extraction. In some cases, Preprocessing the captured image can produce better-quality images ^[21]. The preprocessing stages in this study are:

1. Image cropping: Image cropping is a technique to separate parts of an image. Image cropping helps the observer to focus on important areas, especially on non-dominant like exudates areas so these areas can be recognized in more detail. Fig. 2 shows a cropping process. Fig. 2(a) is the original image which shows all areas of the image. Fig. 2(b) shows the results of cropping the image in the desired area by removing unnecessary parts.

2. Augmentation: The amount of data available in this research is still limited to only 500 images. A limited amount of data can affect the performance of deep learning methods for image segmentation. To overcome the problem of limited data, the study applies an augmentation method. The augmentation method is an image transformation process such as flipping, rotating, shifting, etc. which is carried out randomly. The study uses horizontal flipping and vertical flipping augmentation techniques because these techniques are easy to implement and provide different variations of the original image. In the study, each image will be flipped twice, namely vertically and horizontally, so that the total images available after augmentation are 1000 retinal images. The augmentation process used is in Fig. 2. Each image will be flipped twice, namely vertically and horizontally, so that the total images available after augmentation are 1000 retinal images. The augmentation process used is in Fig. 2.

Fig. 2. Illustration of (a) cropping process and (b) flipping augmentation.

3. Green Channel: The Green Channel is one of the three dimensions contained in RGB images. Retrieving Green Channels in RGB images can be done as part of the Pre-processing stage. Green Channel can provide maximum local contrast between background and foreground, as shown in Fig. 3 ^[22]. Fig. 3 shows that the Green Channel enhances the intensity differences between objects in the image more clearly compared to other channels.

Fig. 3. The Result of Each Channel on RGB Image

4. Contrast Stretching: Contrast Stretching serves as a pre-processing technique aimed at delivering images with improved contrast. Contrast Stretching involves adjusting the intensity range of pixels in an image. This technique works by increasing the darkness of dark areas and the brightness of light areas ^[23].

2.2. RES2U-NET architecture

The RES2U-Net architecture is a combination of UNet and residual block. The UNet has a structure resembling the letter U, utilizing concatenate layers to pass information from encoder to decoder which is useful for image segmentation ^[24]. In the UNet architecture, there are several operations consisting of Convolutional Layer, ReLU, Max Pooling, Transposes Convolution, Concatenate, and Sigmoid. The input image is handled by 64 kernels, while the first block operates with 128 kernels. In UNet, the second and third blocks repeat the same operation. The second block contains 256 kernels, while the third block contains 512 kernels. The bridge section which is the link between the encoder and decoder has 1024 kernels. On the decoder path, Transposes Convolution is carried out with size $2 \times 2$. The first block on the decoder uses 512 kernels. The second to fourth blocks repeat the same process. There are 256 kernels in the second block, 128 in the third, and 64 in the fourth block. At each block in the decoder path, a concatenate process occurs from the encoder path. Residual Network (ResNet) architecture is an architecture that uses residual blocks ^[25]. ResNet architecture is commonly used in classification cases. The architecture aims to learn the difference between the input and output, thereby simplifying the learning task using residual blocks. In the ResNet architecture, there is a collection of residual blocks consisting of a convolutional layer, batch normalization, and ReLU. The residual block can be used as an encoder part for segmentation cases. Residual blocks on the decoder UNet should be combined with an upsampling to increase the image resolution of the lower feature map to a higher image resolution. The operations involving RES2U-Net are:

(1)

$ c_{ij} = \sum_{u=0}^{g-1} \sum_{v=0}^{h-1} d_{u+i, v+j(p)} \times a_{u+i, v+j(q)} + b_q. $

Based on Eq. (1), $i$ is the output matrix row and $j$ is a column of the output matrix. $g$ and $h$ which are the length and width of the input matrix and kernel. $c_{ij}$ refers to the matrix entry of the convolution result at the $i$-th row and $j$-th column. $d_{u+i,v+j}$ is the matrix entry of $p$-th input from $D_p$ at $u+i$-th row and $v+j$-th column. $a_{u+i,v+j}$ is the matrix entry of $q$-th kernel from $A_q$ at $u+i$-th row and $v+j$-th column. $b_q$ is bias for$q$-th kernel.

1. Convolutional layer: The convolutional layer is fundamental in CNN. This layer multiplies the pixel matrix. The convolutional layer uses various kernels on the input image to calculate the result, which is represented as a feature map. Carrying out the convolution process can be calculated using Eq. (1).

2. Batch normalization: Batch normalization technique serves to normalize activation in the neural network layers. BN can enhance accuracy and accelerate the training process.

3. Rectified linear unit (ReLU): The ReLU activation function is intended to capture the positive component of the input while eliminating the negative component by converting it to zero. Activation functions are used to detect non-linear features and improve CNN performance.

4. Skip connection: Skip connection is a technique in the Residual Network (ResNet) designed to address the vanishing gradient problem. Skip connection accelerates network training convergence by allowing the direct flow of information and gradients through the network.

5. Sigmoid activation function: The sigmoid activation function is a non-linear function frequently utilized in CNNs to add nonlinearity to the model. The sigmoid activation function generates outputs that range from 0 to 1.

6. Binary coss entropy loss function: The binary cross entropy loss function works to minimize the average error in predictions compared to the target labels for each pixel.

2.3. Training and Testing

Training is the stage for a model or architecture to learn to recognize various patterns in the data. The result of training is the best weight that will be used in the testing stage. The total data to be trained from the pre-processing stage is 1000 images. This data will be divided into 68% for training data, 17% for validation data, and 15% for testing data. The training stage begins with initializing the parameters used such as epoch and batch size. Next, the input image goes through the Green Channel capture stage and increases the contrast using Contrast Stretching. Data that has gone through the image quality improvement stage will be used to train the RES2U-Net model. The weights acquired during the training process will be applied during the testing process. Testing aims to determine the ability of the model that has been trained in testing. To measure the effectiveness of the RES2U-Net architecture, the segmentation results of retinal images are compared with the ground truth.

2.4. Evaluation

Performance evaluation in the segmentation process is carried out using the Confusion Matrix. The Confusion Matrix contains values that represent image pixels and is used to measure model performance. The values in the Confusion Matrix consist of TP (True Positive) shows correctly predicted objects, FP (False Positive) shows an incorrectly predicted background, TN (True Negative) indicates a background correctly predicted, and FN (False Negative) indicates the incorrectly predicted object ^[26]. Performance measures used to measure model performance include Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), F1-Score, IoU, Geometric Average (G-Mean), Precision(Prec), and Area Under Curve (AUC).

3. Result

3.1. Preprocessing

At this stage, the pre-processing process is carried out on the initial image. The pre-processing stage aims to improve image quality before it is used in the segmentation stage. The images used in the pre-processing stage were obtained from the IDRID dataset. Pre-processing results can be seen in Fig. 4. In Fig. 4, it can be seen that the input image is cropped which aims to focus on the central part of the image. The cropping process is carried out by trimming each side using the same unit. The image that has been cropped is continued by taking the green channel from the image which can provide significant local contrast between the background and foreground. The results of taking the green channel are continued by improving the image contrast so that it has a sharper contrast level using contrast stretching.

Fig. 4. Preprocessing results. (a) Original image. (b) Cropping. (c) Green Channel. (d) Contrast Stretching.

3.2. RES2U-Net Application

Architectural modifications are carried out using the UNet architecture which functions as the main foundation. The residual block of ResNet architecture is positioned in the encoder and decoder of UNet. The Batch Normalization process contained in each convolution operation aims to equalize the activation in each layer. In the convolution operation, there is a ReLU activation function that is used to store the positive values of the input image and detect non-linear features. The encoder path uses 4 ResNet blocks consisting of a Convolutional Layer, Batch Normalization, and ReLU. In the link between the encoder and decoder, there is a convolution operation accompanied by Batch Normalization and ReLU twice with 1024 kernels. Some ResNet blocks are again used in the decoder path. The final process in this architecture is a convolution operation with 1 kernel to reshape the image that has been segmented and accompanied by a sigmoid activation function. The RES2U-Net architecture can be seen in Fig. 5.

Fig. 5. Illustration of the RES2U-Net architecture for exudates segmentation.

Fig. 5 shows that the RES2U-Net architecture consists of three parts, namely encoder, bridge, and decoder. The encoder path uses 4 residual blocks consisting of a Convolutional Layer, Batch Normalization, and ReLU. The initial step taken in the encoder path is processing the input image with a total of 32 kernels. Continued with the first ResNet block using 64 kernels. Then in the second to fourth blocks repeat the process as in the previous block. The second block has 128 kernels, the third block has 256 kernels, and the fourth block has 512 kernels. The bridge section has 1024 kernels. On the decoder path, UpSampling is carried out with size $2 \times 2$. Continue on the first residual block using 512. Then on the second to fourth blocks repeat the process as in the previous block. The second block has 256 kernels, the third block has 128 kernels, and the fourth block has 64 kernels. The final process in this architecture performs a convolution operation with 1 kernel to reshape the segmented image and is accompanied by a sigmoid activation function.

3.3. Training and Testing

At this stage, the data used during the learning process uses the RES2U-Net architecture measuring 256 $\times$ 256. This data consists of 680 images for training data, 170 images for validation data, and 150 for testing data. The training process is carried out for 200 epochs with 41 batches in 1 epoch. At the beginning of training, the initial weights are initialized and then the amount of loss is calculated for each epoch. The weight is saved if the loss value in validation is smaller than the previous epoch and the weight is updated for the next epoch. In the RES2U-Net architecture, there are several operations such as Convolutional Layer, Batch Normalization, ReLU, UpSampling, and Sigmoid. The accuracy graph and loss graph during training and validation using the RES2U-Net architecture can be seen in Fig. 6.

Fig. 6. RES2U-Net training and validation graph. (a) Accuracy and (b) loss for exudate segmentation on retinal image.

In Fig. 6(a), it can be seen that accuracy experienced a significant increase in the first epoch, namely 87.98%. The next epoch shows a gradual increase with a very small difference until the last epoch. Accuracy at the end of the epoch was 98.18%. Fig. 6(b) shows that the loss obtained during training has decreased. Loss in the first epoch was 31.87%. The loss obtained shows that there is no overfitting and the difference between training loss and validation loss is not too far. The architecture shows the ability to work very well because it acquires accuracy of more than 95% and Loss is close to 0. Chart sensitivity and specificity during training and validation using the RES2U-Net architecture can be seen in Fig. 7. In Fig. 7(a), sensitivity shows a value of 23.45% in the first epoch. The next epoch showed quite a large increase until epoch 20. It stabilizes until the last epoch with a small increase. Fig. 7(b) shows specificity obtained during training and validation is not much different from the initial value respectively 14.86% and 03.33%. In the second epoch, it shows the specificity obtained was above 50% and stable until the last epoch. F1-Score and IoU graphs during training and validation using the RES2U-Net architecture can be seen in Fig. 8.

Fig. 7. RES2U-Net training and validation graph. (a) Sensitivity and (b) specificity for exudate segmentation on retinal image.

Fig. 8. RES2U-Net training and validation graph. (a) F1-Score and (b) IoU for exudate segmentation on retinal image.

In Fig 8(a) F1-Score obtained in the first epoch was 16.59%. The next epoch showed a gradual increase until in the 5th epoch it was above 56%. At the end of the epoch, the F1-Score was 87.06%. Fig. 8(b) shows that IoU obtained during training has increased. IoU at the beginning of the epoch was 0.1 and increased rapidly until the 3rd epoch was above 0.3. In the next epoch, it is stable until the last epoch. Although there are still gaps during training and validation in F1-Score and IoU, the architecture shows a good balance between recall and precision with a high level of similarity to ground truth.

After going through the training stage on retinal image data, the weights obtained during the learning process are tested on images that have not been seen before by the model. The testing stage aims to determine the ability of the proposed model to segment retinal images against ground truth. A comparison of several images predicted from the model with the ground truth can be seen in Table 1. Table 1 shows several original images along with the ground truth used in the testing stage. It can be seen that the exudate prediction results on retinal images show a close similarity to the ground truth. Although there are still several pixels that have not been identified as exudate. This is shown by the slight difference between the prediction results and ground truth. Especially in small areas of exudate.

Table 1. Comparison of segmented images and ground truth using RES2U-Net for exudate segmentation on retinal image

Performance evaluation can be measured by comparing prediction results and ground truth using the Confusion Matrix. Based on the Confusion Matrix obtained, the performance evaluation of RES2U-NET provides accuracy of 98.86%, sensitivity of 76.91%, specificity of 99.56%, and precision of 84.70%. These results show optimal overall capabilities, both on the foreground and background labels. On the sensitivity value, the resulting results show the model’s ability to identify background labels better than foreground labels. The specificity value shows higher performance in identifying background areas. The precision obtained is quite high, especially for the label’s background. This shows that the model tends to provide a high rate of positive predictions for true-label backgrounds. The performances measure the balance between majority and minority data, namely F1-Score and G-Mean. The F1-Score obtained from IDRID data was 0.81 and the G-Mean obtained was 0.88. The IoU performance gives a result of 0.68. Another performance feature is the Receiver Operating Characteristics (ROC) curve. ROC is a probability that summarizes the performance of the Confusion Matrix at all threshold values. Area Under Curve (AUC) converts the ROC curve to numeric to measure performance with values between 0 and 1. ROC using the RES2U-Net architecture can be seen in Fig. 9. In Fig. 9, it can be seen that the ROC obtained shows a curve that approaches the upper left corner. This can be interpreted as meaning that the model has a good level of sensitivity (True Positive Rate), namely 0.769. A fairly high TPR value indicates that the model can detect most of the exudate in the image. In addition, the model shows the FPR value obtained is 0.004. This can be interpreted that the model has a low error rate in detecting the background (False Positive Rate). With an AUC value of 0.88, the ROC curve illustrates that RES2U-Net has good performance for exudate segmentation with the ability to accurately separate exudate and background.

Fig. 9. ROC curve using RES2U-Net for exudate segmentation on retinal image.

4. Discussion

This study has demonstrated the RES2U-NET architecture to segment exudates in retinal images. The performance results used as a comparison are Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), Precision (Prec), F1-Score, G-Mean, Intersection over Union (IoU), and Area Under the Curve (AUC). A comparison of the results of RES2U-NET and other studies can be seen in Table 2 and Table 3. Table 2 and Table 3 contain the results of the study using the IDRID dataset for exudate segmentation. Table 2, shows that the RES2U-NET on the IDRID dataset shows the highest performance results in all aspects.

The accuracy obtained shows that RES2U-Net successfully classifies exudates and backgrounds as excellent but is still dominant in the background area. The sensitivity in RES2U-Net is an indication that it is well capable of exudate segmentation (foreground) in retinal images, despite there still being those that are mistakingly recognized as background (non-exudate). The specificity result shows that RES2U-Networks excellently in segmenting background areas (non-exudate). The precision result obtained from this study shows that the exudate prediction results are accurate but there are still some background areas that are detected as exudate. The performance is used to measure the balance of data in handling majority and minority data are F1-Score and G-Mean. F1-Score is used to measure the balance between precision and recall. G-Mean is used to provide an overall picture of the model’s performance in handling data balance. Other performances are IoU and Area Under Curve (AUC). The F1-Score result obtained shows that the segmentation architecture has a fairly balanced performance between precision and recall with relatively low errors in recognizing target objects. The G-mean results approaching 1 indicate that the model maintains stable performance without bias toward a specific class (object or background). Table 3 shows that RES2U-Net achieves an IoU of 70%. This result indicates that the RES2U-Net architecture is quite good at covering up to 70% of the ground truth area. The AUC results confirm the ability of the proposed architecture to distinguish foreground and background pixels. On the IDRID dataset, RES2U-NET achieves the highest F1-Score, G-Mean, IoU, and AUC among similar studies. Although the sensitivity and IoU are still in the good category, both are still the highest results compared to the results of other studies ^[9], ^[27], ^[28], ^[29].

Table 2. The comparison of accuracy (Acc), sensitivity (Sn), specificity (Sp), and precision (Prec) results in the study with other studies

Method	Acc (%)	Sn (%)	Sp (%)	Prec (%)
DL Tensorflow ^[27]	96.70	41.56	98.29	41.31
UNet ^[9]	-	67.13	-	-
DCNN ^[28]	95.93	71.19	98.32	-
SS-MAF ^[29]	-	69.39	-	-
RES2U-NET	98.86	76.91	99.56	84.70

Table 3. The comparison of F1-Score, G-Mean, IoU, and AUC results in the study with other studies

Method	F1-Score	G-Mean	IoU	AUC
DL Tensorflow ^[27]	0.42	0.64	0.26	-
UNet ^[9]	0.80	-	0.67	-
DCNN ^[28]	-	0.84	-	-
SS-MAF ^[29]	0.73	-	0.57	0.86
RES2U-NET	0.81	0.88	0.7	0.88

The accuracy result above 90% indicates that RES2U-Net has excellent performance in segmenting both the exudate area (foreground), but based on the sensitivity results obtained, it shows that RES2U-Net works very well in the background area. These results indicate the need for further research to improve the performance of RES2U-Net in detecting exudates (foreground) in retinal images. The RES2U-Net architecture is capable of segmenting hard and soft exudates in retinal images. The study provides an alternative CNN architecture that can be developed as a model for building an automatic system for early detection of retinal abnormalities.

5. Conclusion

Based on the study that has been carried out, the implementation of RES2U-Net provides excellent performance results in exudate segmentation of retinal images. This can be seen from the accuracy result of more than 90%, which shows that RES2U-Net can perform optimal segmentation overall. The sensitivity, specificity, and precision obtained show good ability in identifying background areas in the image and objects that are predicted to be relevant. Apart from that, the model can separate the background and foreground which can be seen in the F1-Score of 0.81. The F1-Score obtained shows a good balance between precision and recall. The performances of G-Mean and AUC show that RES2U-Net has a good ability to differentiate exudate and background. The sensitivity shows the model’s ability to detect exudates quite well. Although RES2U-Net was successful in identifying exudates and had a good balance in separating exudates from the background. The IoU shows that the match between segmentation results and ground truth still needs to be improved to overcome the overlap between ground truth and foreground (exudate) results so that segmentation results become more valid and accurate.

Acknowledgment

The study/publication of this article was funded by DIPA of the Public Service Agency of Universitas Sriwijaya 2024. No. 0098.122/UN9/SB3.LP2M.PT/20244, On November 21, 2023. Under the Rector’s Decree Number. 00l3/UN9/LP2M.PT/2024, On May 20, 2024.

References

P. Sarkar , O. Dewangan , A. Joshi , A review on applications of artificial intelligence on bionic eye designing and functioning, Scandinavian Journal of Information Systems, Vol. 35, No. 1, pp. 1119-1127, 2023

M. Kropp , O. Golubnitschaja , A. Mazurakova , L. Koklesova , N. Sargheini , T.-T. K. S. Vo , E. de Clerck , J. Polivka , P. Potuznik , J. Polivka , I. Stetkarova , P. Kubatke , G Thumann , Diabetic retinopathy as the leading cause of blindness and early predictor of cascading complications—-risks and mitigation, EPMA Journal, Vol. 14, No. 1, pp. 21-42, 2023

P. K. Jena , B. Khuntia , C. Palai , M. Nayak , T. K. Mishra , S. N. Mohanty , A novel approach for diabetic retinopathy screening using asymmetric deep learning features, Big Data and Cognitive Computing, Vol. 7, No. 1, 2023

S. Alqazzaz , X. Sun , X. Yang , L. Nokes , Automated brain tumor segmentation on multi-modal MR image using SegNet, Computational Visual Media, Vol. 5, No. 2, pp. 209-219, 2019

A. Desiani , E. B. Suprihatin , D. Riana , M. Arhami , I. Ramayanti , Y. Utama , Denoised non-local means with BDDU-Net architecture for robust retinal blood vessel segmentation, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 37, No. 16, pp. 1-27, 2023

A. Desiani , Erwin , B. Suprihatin , F. Efriliyanti , M. Arhami , E. Setyaningsih , VG-DropDNet a robust architecture for blood vessels segmentation on retinal image, IEEE Access, Vol. 10, pp. 92067-92083, 2022

M. Hashemi , Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation, Journal of Big Data, Vol. 6, No. 1, pp. 1-13, 2019

A. Desiani , Erwin , B. Suprihatin , S. B. Agustina , A robust techniques of enhancement and segmentation blood vessels in retinal image using deep learning, Biomedical Engineering: Applications, Basis and Communications, Vol. 34, No. 4, pp. 2250019, 2022

Y. Xu , Z. Zhou , X. Li , N. Zhang , M. Zhang , P. Wei , FFU-Net: feature fusion U-Net for lesion segmentation of diabetic retinopathy, Biomedical Research International, pp. 1-12, 2021

L. Geng , H. Che , Z. Xiao , Y. Liu , Extracting retinal anatomy and pathological structure using multiscale segmentation, Applied Sciences, Vol. 9, No. 18, 2019

F. Zabihollahy , A. Lochbihler , E. Ukwatta , Deep learning based approach for fully automated detection and segmentation of hard exudate from retinal images, Proceedings of SPIE, Vol. 13, No. 9, pp. 101420-1095308, 2019

R. Li , S. Zheng , C. Duan , C. Zhang , J. Su , P. M. Atkinson , Multi-attention-network for semantic segmentation of fine resolution remote sensing images, IEEE Transactions on Geoscience and Remote Sensing, Vol. 60, pp. 1-13, 2021

S.-H. Noh , Performance comparison of CNN models using gradient flow analysis, Informatics, Vol. 8, No. 3, pp. 1-13, 2021

S. Targ , D. Almeida , K. Lyman , ResNet in ResNet: Generalizing residual architectures, arXiv preprint arXiv:1603.08029, 2016

Z. Wu , T. Nagarajan , A. Kumar , S. Rennie , L S. Davis , K. Grauman , R. Feris , BlockDrop: dynamic inference paths in residual networks, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8817-8826, 2018

G. Singh , A. Mittal , N. Aggarwal , ResDNN: deep residual learning for natural image denoising, IET Image Processing, Vol. 14, No. 11, pp. 2425-2434, 2020

P. Porwal , S. Pachade , M. Kokare , G. Deshmukh , IDRiD: diabetic retinopathy-segmentation and grading challenge, Medical Image Analysis, Vol. 59, pp. 101561, 2020

A. Subhasree , J. B. Princess , Salaja , Analysis and automatic detection of microaneurysms in diabetic retinopathy using transfer learning, Proc. of 6th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 1-7, 2022

M. Mateen , J. Wen , N. Nasrullah , S. Sun , S. Hayat , Exudate detection for diabetic retinopathy using pretrained convolutional neural networks, Complexity, 2020

P. Porwal , S. Pachade , M. Kokare , G. Deshmukh , J. Son , Indian Diabetic Retinopathy Image Dataset (IDRiD), 2020

K. Dharavath , F. A. Talukdar , R. H. Laskar , Improving face recognition rate with image preprocessing, Indian Journal of Science and Technology, Vol. 7, No. 8, pp. 1170-1175, 2014

A. Fauzi , L. E. Lubis , Optimization of retinal blood vessel segmentation based on Gabor filters and particle swarm optimization, Indonesian Journal of Electrical Engineering and Computer Science, Vol. 29, No. 3, pp. 1590-1596, 2023

D. A. Dharmawan , L. Listyalina , Retinal blood vessel segmentation as a tool to detect diabetic retinopathy, Journal of Electrical Technology UMY, Vol. 3, No. 2, pp. 44-49, 2019

A. Desiani , Erwin , B. Suprihatin , S. Yahdin , A. I. Putri , F. R. Husein , Bi-path architecture of CNN segmentation and classification method for cervical cancer disorders based on Pap-smear images, IAENG International Journal of Computer Science, Vol. 48, No. 3, pp. 782-791, 2021

Z. Fan , H. Lin , C. Li , J. Su , S. Bruno , G. Loprencipe , Use of parallel ResNet for high-performance pavement crack detection and measurement, Sustainability, Vol. 14, No. 3, pp. 1-21, 2022

M. Arhami , A. Desiani , S. Yahdin , A. Islamia , Contrast enhancement for improved blood vessels retinal segmentation using top-hat transformation and Otsu thresholding, International Journal of Advances in Intelligent Informatics, Vol. 8, No. 2, pp. 210-223, 2022

A. Benzamin , C. Chakraborty , Detection of hard exudates in retinal fundus images using deep learning, Proc. of International Conference on Informatics, Electronics and Vision, pp. 465-469, 2018

S. Basu , S. Mukherjee , A. Bhattacharya , A. Sen , Segmentation of blood vessels, optic disc localization, detection of exudates, and diabetic retinopathy diagnosis from digital fundus images, Advances in Intelligent Systems and Computing, pp. 173-184, 2020

J. Zhang , X. Chen , Z. Qiu , M. Yang , Y. Hu , J. Liu , Hard exudate segmentation supplemented by super-resolution with multi-scale attention fusion module, Proc. of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1375-1380, 2022

Anita Desiani

Anita Desiani is an associate professor at the Mathematics Department of Mathematics and Natural Science Faculty, Universitas Sriwijaya. She received a mathematics bachelor from Universitas Sriwijaya in 2000, a magister degree in computer science from Universitas Gadjah Mada in 2003, and a doctoral degree in the Mathematics and science department in 2022. Her current research interests include the fields of data mining, image processing, pattern recognition, computer vision, and artificial intelligence.

Bambang Suprihatin

Bambang Suprihatin received his bachelor’s degree of mathematics from Universitas Sriwijaya, Indonesia, in 1994, and an M.Sc. degree in mathematics from the Bandung Institute of Technology (ITB), Bandung, Indonesia, in 2002. He was an Associate Professor in 2011. Since 2012. He received his Doctorate in Mathematics, at Universitas Gadjah Mada (UGM) in 2016. His current research interests are statistics and modeling. He has experience as an author of statistics and mathematics books.

Muhammad Suedarmin

Muhammad Suedarmin was born in Muara Bungo, 7 November 2000. He received a mathematics bachelor’s degree from Universitas Sriwijaya in 2022. In 2019, he joined the Computer Laboratory of the Faculty of Mathematics and Natural Sciences, Sriwijaya University as an Assistant Lecturer. His current research includes the fields of image processing, pattern recognition and computer vision, data mining, and artificial intelligence.

Siti Rusdiana Puspa Dewi

Siti Rusdiana Puspa Dewi is a lecturer in the Dentistry Study Program, at the Faculty of Medicine, Universitas Sriwijaya. She graduated from North Sumatra University in 2004 as a dentist. In 2015, she continued her study as magister of biomedical science at Universitas Sriwijaya. Now she has registered as a student of the Bioscience Doctoral Program at the Faculty of Medicine, Universitas Sriwijaya.

Akmal Junaidi

Akmal Junaidi is an associate professor at the Department of Computer Science, Universitas Lampung, Indonesia. He received his Ph.D. degree in computer science from TU Dortmund, Germany in 2016, an M.Sc. degree in telematics from Universiteit Twente, The Netherlands in 2003, and a B.Sc. in Mathematics from Universitas Sriwijaya, Indonesia in 1995. His research interests include deep learning, image processing and analysis, pattern recognition, natural language processing, and steganography.

Muhammad Arhami

Muhammad Arhami received a mathematics bachelor’s degree from Universitas Syahkuala in 2000, and a magister’s degree in computer science from Universitas Gadjah Mada in 2004. He is an associate professor in 2013. His research fields are artificial intelligence, Mathematics, data mining, software engineering, and data structure. He already has experience as the author of artificial intelligence, Expert System, Matlab programming books, and mathematics books. In 2020.