(Anita Desiani)
1,*
(Bambang Suprihatin)
1
(Muhammad Suedarmin)
1
(Siti Rusdiana Puspa Dewi)
2
(Akmal Junaidi)
3
(Muhammad Arhami)
4
-
(Department of Mathematics, Universitas Sriwijaya, Indralaya, Indonesia. {anita_desiani,
bambangs}@unsri.ac.id, muhammadsuedarmin@gmail.com)
-
(Departement of Medicine, Universitas Sriwijaya, Indralaya, Indonesia. sitirusdiana@fk.unsri.ac.id)
-
(Department of Informatics Engineering, Universitas Lampung, Bandar Lampung, Indonesia.
akmal.junaidi@fmipa.unila.ac.id)
-
(Department of Informatics Engineering, Politeknik Negeri Lhokseumawe, Aceh, Indonesia.
muhammad.arhami@pnl.ac.id)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Diabetes, Exudate, Health research, Health risk, Segmentation
1. Introduction
The retina is the tissue layer behind the eyeball. The retina functions as a bridge
that converts light from the eye into signals that are sent to the brain to form vision
[1]. One of the disorders that occurs in the retina is diabetic retinopathy. Diabetic
retinopathy is a complication of diabetes mellitus which can cause blindness [2]. Diabetic retinopathy is characterized by the presence of exudates on the retina
called exudates. Exudates can be described as irregularly shaped yellow spots that
appear on the retina [3]. The exudate usually has sharp boundaries and is brighter than the retinal background.
Exudates are divided into two forms, namely soft exudates and hard exudates. Early
detection of this disease is essential for preventing vision loss. Identification
of exudates in retinal images can be done by carrying out a segmentation process.
Segmentation is employed to distinguish between several objects under study as foreground
and unstudied objects as background. Manual segmentation requires high precision,
requires special skills, and requires a long time with a large amount of data. Automatic
segmentation is to overcome the limitations of manual segmentation [4,
5]. The application of Convolutional Neural Network (CNN) can help medical personnel
segment images quickly and accurately [6]. CNN can detect and recognize objects from digital images [7]. A U-shaped network (UNet) is a standard CNN architecture for image segmentation
[8]. The study by Xu et al. [9] applied UNet for hard exudate segmentation and produced an F1-Score above 85%. However,
sensitivity is still below 80%. The study by [10] implemented UNet architecture for exudate segmentation but the accuracy and specificity
are still below 85%. The study by [11] applied UNet architecture only for hard exudate segmentation and produced sensitivity
and F1-Score below 80%. UNet requires a deep network to be able to learn detailed
features in images [12]. Deep networks in UNet occur with the addition of layers consisting of the addition
of non-linear activation functions, convolution layers, and downsampling. The deep
layers make the Encoder UNet is difficult to update the training weights. The model’s
ability may decrease to capture important features of the image at deeper layers due
to repeated downsampling processes. The inner layers in UNet also affect the performance
of the decoder section. The process of returning the image to its original size through
an up-sampling operation on the decoder is incomplete due to non-optimal feature representation
on the encoder section. Non-optimal feature representation can result in a vanishing
gradient. The vanishing gradient can hinder the learning process of the model to learn
complex features, thereby impacting overall segmentation performance [13].
One technique to overcome the UNet problem is to combine different architectures.
A Residual Network (ResNet) has a structure similar to UNet but is equipped with skip
connections that occur in the residual block. ResNet contains residual blocks that
are skipped by the original information on the skip connection path [14]. A path that permits data to flow straight from the input to the next layer is called
a skip connection [15]. Skip connections help keep the original information intact and allow gradients to
more efficiently pass through the layers of the network during training [16]. Residual blocks can help prevent vanishing gradients during training. The application
of Residual block to segmentation has been carried out in several studies. Applying
residual block for segmentation involves adding several convolution layers at the
end of the network to convert the output into a segmented image. Several studies have
modified the UNet architecture with residual blocks for exudate segmentation in retinal
images. The study by [17] applied Residual blocks on encoder UNet for exudate segmentation. This study produces
an accuracy above 50% without measuring other performance. The study by [18] applied residual blocks on the decoder UNet for hard exudates segmentation. This
study only produces accuracy and specificity above 85%, but the sensitivity is still
below 80%. The study by [19] performed exudate segmentation using modified UNet with residual blocks and produced
accuracy above 85% but did not measure other performance.
This study proposes a modified UNet architecture by replacing the encoder and decoder
of UNet with residual blocks in ResNet or double residual blocks on a U-shaped network
(RES2U-Net) for exudate segmentation in retinal images. The RES2U-Net architecture
is a modified UNet architecture with residual blocks. The residual blocks are not
only used to modify the UNet encoder section but also to modify the UNet decoder sections.
The UNet Encoder section is responsible for extracting and compressing features from
the image using repeated convolutions and downsampling. Repeated convolutions and
downsampling can result in the loss of feature details, especially low-level features.
In addition, the deeper the layer, the more non-linear activations that cause a vanishing
gradient. The Residual block has a skip connection that provides a direct path to
send low-level features from the previous layer to the next layer so the feature details
can be preserved and can overcome the occurrence of vanishing gradient because it
does not need to undergo non-linear transformations at each layer. The UNet decoder
section performs upsampling to restore segmented features to the original image size.
In the upsampling process, sometimes some fine details are lost when encoded. The
use of residual blocks in the decoder section can help in recovering details lost
during the encoder process. In the upsampling process, sometimes some fine details
are lost when encoded. Residual blocks allow the decoder to learn lost information.
Residual blocks can call features that have been missed using skip connections. Skip
connections ensure that the initial features are carried over to the last layer. This
study still maintains the UNet bridge section. The bridge of the UNet is to help preserve
significant data and guarantee a strong correlation between the features extracted
by the encoder and sent to the decoder. The exudates segmented in this study are not
only hard exudates but also soft exudates. The proposed architecture is expected to
be an accurate architecture for exudate segmentation in retinal images. Segmentation
results are evaluated based on accuracy, sensitivity, specificity, precision, F1-Score,
Geometric Average (G-Mean), Intersection over Union (IoU), and Area Under Curve (AUC)
to assess the proposed architecture’s exudate segmentation effectiveness.
2. Method
This study uses retinal image data obtained from the Indian Diabetic Retinopathy Image
Dataset (IDRiD) [20]. The stages in this study consist of stages pre-processing and segmentation stage.
The pre-processing stage is carried out by taking the Green Channel and applying Contrast
Stretching. The segmentation stage was the stage of separating objects in the image
using the architecture proposed in this study. The segmentation stage consisted of
training and testing processes. The overall stages in this study can be seen in Fig. 1
Fig. 1. Illustration of the study stages in IDRID dataset segmentation.
2.1. Pre-processing
Image quality can be improved at the pre-processing stage. Pre-processing is carried
out before feature extraction from the image. The segmentation stage used preprocessed
images. The purpose of pre-processing is to get the image ready for feature extraction.
In some cases, Preprocessing the captured image can produce better-quality images
[21]. The preprocessing stages in this study are:
1. Image cropping: Image cropping is a technique to separate parts of an image. Image cropping helps
the observer to focus on important areas, especially on non-dominant like exudates
areas so these areas can be recognized in more detail. Fig. 2 shows a cropping process. Fig. 2(a) is the original image which shows all areas of the image. Fig. 2(b) shows the results of cropping the image in the desired area by removing unnecessary
parts.
2. Augmentation: The amount of data available in this research is still limited to only 500 images.
A limited amount of data can affect the performance of deep learning methods for image
segmentation. To overcome the problem of limited data, the study applies an augmentation
method. The augmentation method is an image transformation process such as flipping,
rotating, shifting, etc. which is carried out randomly. The study uses horizontal
flipping and vertical flipping augmentation techniques because these techniques are
easy to implement and provide different variations of the original image. In the study,
each image will be flipped twice, namely vertically and horizontally, so that the
total images available after augmentation are 1000 retinal images. The augmentation
process used is in Fig. 2. Each image will be flipped twice, namely vertically and horizontally, so that the
total images available after augmentation are 1000 retinal images. The augmentation
process used is in Fig. 2.
Fig. 2. Illustration of (a) cropping process and (b) flipping augmentation.
3. Green Channel: The Green Channel is one of the three dimensions contained in RGB images. Retrieving
Green Channels in RGB images can be done as part of the Pre-processing stage. Green
Channel can provide maximum local contrast between background and foreground, as shown
in Fig. 3
[22]. Fig. 3 shows that the Green Channel enhances the intensity differences between objects in
the image more clearly compared to other channels.
Fig. 3. The Result of Each Channel on RGB Image
4. Contrast Stretching: Contrast Stretching serves as a pre-processing technique aimed at delivering images
with improved contrast. Contrast Stretching involves adjusting the intensity range
of pixels in an image. This technique works by increasing the darkness of dark areas
and the brightness of light areas [23].
2.2. RES2U-NET architecture
The RES2U-Net architecture is a combination of UNet and residual block. The UNet has
a structure resembling the letter U, utilizing concatenate layers to pass information
from encoder to decoder which is useful for image segmentation [24]. In the UNet architecture, there are several operations consisting of Convolutional
Layer, ReLU, Max Pooling, Transposes Convolution, Concatenate, and Sigmoid. The input
image is handled by 64 kernels, while the first block operates with 128 kernels. In
UNet, the second and third blocks repeat the same operation. The second block contains
256 kernels, while the third block contains 512 kernels. The bridge section which
is the link between the encoder and decoder has 1024 kernels. On the decoder path,
Transposes Convolution is carried out with size $2 \times 2$. The first block on the
decoder uses 512 kernels. The second to fourth blocks repeat the same process. There
are 256 kernels in the second block, 128 in the third, and 64 in the fourth block.
At each block in the decoder path, a concatenate process occurs from the encoder path.
Residual Network (ResNet) architecture is an architecture that uses residual blocks
[25]. ResNet architecture is commonly used in classification cases. The architecture aims
to learn the difference between the input and output, thereby simplifying the learning
task using residual blocks. In the ResNet architecture, there is a collection of residual
blocks consisting of a convolutional layer, batch normalization, and ReLU. The residual
block can be used as an encoder part for segmentation cases. Residual blocks on the
decoder UNet should be combined with an upsampling to increase the image resolution
of the lower feature map to a higher image resolution. The operations involving RES2U-Net
are:
Based on Eq. (1), $i$ is the output matrix row and $j$ is a column of the output matrix. $g$ and $h$
which are the length and width of the input matrix and kernel. $c_{ij}$ refers to
the matrix entry of the convolution result at the $i$-th row and $j$-th column. $d_{u+i,v+j}$
is the matrix entry of $p$-th input from $D_p$ at $u+i$-th row and $v+j$-th column.
$a_{u+i,v+j}$ is the matrix entry of $q$-th kernel from $A_q$ at $u+i$-th row and
$v+j$-th column. $b_q$ is bias for$q$-th kernel.
1. Convolutional layer: The convolutional layer is fundamental in CNN. This layer multiplies the pixel matrix.
The convolutional layer uses various kernels on the input image to calculate the result,
which is represented as a feature map. Carrying out the convolution process can be
calculated using Eq. (1).
2. Batch normalization: Batch normalization technique serves to normalize activation in the neural network
layers. BN can enhance accuracy and accelerate the training process.
3. Rectified linear unit (ReLU): The ReLU activation function is intended to capture the positive component of the
input while eliminating the negative component by converting it to zero. Activation
functions are used to detect non-linear features and improve CNN performance.
4. Skip connection: Skip connection is a technique in the Residual Network (ResNet) designed to address
the vanishing gradient problem. Skip connection accelerates network training convergence
by allowing the direct flow of information and gradients through the network.
5. Sigmoid activation function: The sigmoid activation function is a non-linear function frequently utilized in CNNs
to add nonlinearity to the model. The sigmoid activation function generates outputs
that range from 0 to 1.
6. Binary coss entropy loss function: The binary cross entropy loss function works to minimize the average error in predictions
compared to the target labels for each pixel.
2.3. Training and Testing
Training is the stage for a model or architecture to learn to recognize various patterns
in the data. The result of training is the best weight that will be used in the testing
stage. The total data to be trained from the pre-processing stage is 1000 images.
This data will be divided into 68% for training data, 17% for validation data, and
15% for testing data. The training stage begins with initializing the parameters used
such as epoch and batch size. Next, the input image goes through the Green Channel
capture stage and increases the contrast using Contrast Stretching. Data that has
gone through the image quality improvement stage will be used to train the RES2U-Net
model. The weights acquired during the training process will be applied during the
testing process. Testing aims to determine the ability of the model that has been
trained in testing. To measure the effectiveness of the RES2U-Net architecture, the
segmentation results of retinal images are compared with the ground truth.
2.4. Evaluation
Performance evaluation in the segmentation process is carried out using the Confusion
Matrix. The Confusion Matrix contains values that represent image pixels and is used
to measure model performance. The values in the Confusion Matrix consist of TP (True
Positive) shows correctly predicted objects, FP (False Positive) shows an incorrectly
predicted background, TN (True Negative) indicates a background correctly predicted,
and FN (False Negative) indicates the incorrectly predicted object [26]. Performance measures used to measure model performance include Accuracy (Acc), Sensitivity
(Sen), Specificity (Spe), F1-Score, IoU, Geometric Average (G-Mean), Precision(Prec),
and Area Under Curve (AUC).
3. Result
3.1. Preprocessing
At this stage, the pre-processing process is carried out on the initial image. The
pre-processing stage aims to improve image quality before it is used in the segmentation
stage. The images used in the pre-processing stage were obtained from the IDRID dataset.
Pre-processing results can be seen in Fig. 4. In Fig. 4, it can be seen that the input image is cropped which aims to focus on the central
part of the image. The cropping process is carried out by trimming each side using
the same unit. The image that has been cropped is continued by taking the green channel
from the image which can provide significant local contrast between the background
and foreground. The results of taking the green channel are continued by improving
the image contrast so that it has a sharper contrast level using contrast stretching.
Fig. 4. Preprocessing results. (a) Original image. (b) Cropping. (c) Green Channel.
(d) Contrast Stretching.
3.2. RES2U-Net Application
Architectural modifications are carried out using the UNet architecture which functions
as the main foundation. The residual block of ResNet architecture is positioned in
the encoder and decoder of UNet. The Batch Normalization process contained in each
convolution operation aims to equalize the activation in each layer. In the convolution
operation, there is a ReLU activation function that is used to store the positive
values of the input image and detect non-linear features. The encoder path uses 4
ResNet blocks consisting of a Convolutional Layer, Batch Normalization, and ReLU.
In the link between the encoder and decoder, there is a convolution operation accompanied
by Batch Normalization and ReLU twice with 1024 kernels. Some ResNet blocks are again
used in the decoder path. The final process in this architecture is a convolution
operation with 1 kernel to reshape the image that has been segmented and accompanied
by a sigmoid activation function. The RES2U-Net architecture can be seen in Fig. 5.
Fig. 5. Illustration of the RES2U-Net architecture for exudates segmentation.
Fig. 5 shows that the RES2U-Net architecture consists of three parts, namely encoder, bridge,
and decoder. The encoder path uses 4 residual blocks consisting of a Convolutional
Layer, Batch Normalization, and ReLU. The initial step taken in the encoder path is
processing the input image with a total of 32 kernels. Continued with the first ResNet
block using 64 kernels. Then in the second to fourth blocks repeat the process as
in the previous block. The second block has 128 kernels, the third block has 256 kernels,
and the fourth block has 512 kernels. The bridge section has 1024 kernels. On the
decoder path, UpSampling is carried out with size $2 \times 2$. Continue on the first
residual block using 512. Then on the second to fourth blocks repeat the process as
in the previous block. The second block has 256 kernels, the third block has 128 kernels,
and the fourth block has 64 kernels. The final process in this architecture performs
a convolution operation with 1 kernel to reshape the segmented image and is accompanied
by a sigmoid activation function.
3.3. Training and Testing
At this stage, the data used during the learning process uses the RES2U-Net architecture
measuring 256 $\times$ 256. This data consists of 680 images for training data, 170
images for validation data, and 150 for testing data. The training process is carried
out for 200 epochs with 41 batches in 1 epoch. At the beginning of training, the initial
weights are initialized and then the amount of loss is calculated for each epoch.
The weight is saved if the loss value in validation is smaller than the previous epoch
and the weight is updated for the next epoch. In the RES2U-Net architecture, there
are several operations such as Convolutional Layer, Batch Normalization, ReLU, UpSampling,
and Sigmoid. The accuracy graph and loss graph during training and validation using
the RES2U-Net architecture can be seen in Fig. 6.
Fig. 6. RES2U-Net training and validation graph. (a) Accuracy and (b) loss for exudate
segmentation on retinal image.
In Fig. 6(a), it can be seen that accuracy experienced a significant increase in the first epoch,
namely 87.98%. The next epoch shows a gradual increase with a very small difference
until the last epoch. Accuracy at the end of the epoch was 98.18%. Fig. 6(b) shows that the loss obtained during training has decreased. Loss in the first epoch
was 31.87%. The loss obtained shows that there is no overfitting and the difference
between training loss and validation loss is not too far. The architecture shows the
ability to work very well because it acquires accuracy of more than 95% and Loss is
close to 0. Chart sensitivity and specificity during training and validation using
the RES2U-Net architecture can be seen in Fig. 7. In Fig. 7(a), sensitivity shows a value of 23.45% in the first epoch. The next epoch showed quite
a large increase until epoch 20. It stabilizes until the last epoch with a small increase.
Fig. 7(b) shows specificity obtained during training and validation is not much different from
the initial value respectively 14.86% and 03.33%. In the second epoch, it shows the
specificity obtained was above 50% and stable until the last epoch. F1-Score and IoU
graphs during training and validation using the RES2U-Net architecture can be seen
in Fig. 8.
Fig. 7. RES2U-Net training and validation graph. (a) Sensitivity and (b) specificity
for exudate segmentation on retinal image.
Fig. 8. RES2U-Net training and validation graph. (a) F1-Score and (b) IoU for exudate
segmentation on retinal image.
In Fig 8(a) F1-Score obtained in the first epoch was 16.59%. The next epoch showed a gradual
increase until in the 5th epoch it was above 56%. At the end of the epoch, the F1-Score
was 87.06%. Fig. 8(b) shows that IoU obtained during training has increased. IoU at the beginning of the
epoch was 0.1 and increased rapidly until the 3rd epoch was above 0.3. In the next
epoch, it is stable until the last epoch. Although there are still gaps during training
and validation in F1-Score and IoU, the architecture shows a good balance between
recall and precision with a high level of similarity to ground truth.
After going through the training stage on retinal image data, the weights obtained
during the learning process are tested on images that have not been seen before by
the model. The testing stage aims to determine the ability of the proposed model to
segment retinal images against ground truth. A comparison of several images predicted
from the model with the ground truth can be seen in Table 1. Table 1 shows several original images along with the ground truth used in the testing stage.
It can be seen that the exudate prediction results on retinal images show a close
similarity to the ground truth. Although there are still several pixels that have
not been identified as exudate. This is shown by the slight difference between the
prediction results and ground truth. Especially in small areas of exudate.
Table 1. Comparison of segmented images and ground truth using RES2U-Net for exudate
segmentation on retinal image
Performance evaluation can be measured by comparing prediction results and ground
truth using the Confusion Matrix. Based on the Confusion Matrix obtained, the performance
evaluation of RES2U-NET provides accuracy of 98.86%, sensitivity of 76.91%, specificity
of 99.56%, and precision of 84.70%. These results show optimal overall capabilities,
both on the foreground and background labels. On the sensitivity value, the resulting
results show the model’s ability to identify background labels better than foreground
labels. The specificity value shows higher performance in identifying background areas.
The precision obtained is quite high, especially for the label’s background. This
shows that the model tends to provide a high rate of positive predictions for true-label
backgrounds. The performances measure the balance between majority and minority data,
namely F1-Score and G-Mean. The F1-Score obtained from IDRID data was 0.81 and the
G-Mean obtained was 0.88. The IoU performance gives a result of 0.68. Another performance
feature is the Receiver Operating Characteristics (ROC) curve. ROC is a probability
that summarizes the performance of the Confusion Matrix at all threshold values. Area
Under Curve (AUC) converts the ROC curve to numeric to measure performance with values
between 0 and 1. ROC using the RES2U-Net architecture can be seen in Fig. 9. In Fig. 9, it can be seen that the ROC obtained shows a curve that approaches the upper left
corner. This can be interpreted as meaning that the model has a good level of sensitivity
(True Positive Rate), namely 0.769. A fairly high TPR value indicates that the model
can detect most of the exudate in the image. In addition, the model shows the FPR
value obtained is 0.004. This can be interpreted that the model has a low error rate
in detecting the background (False Positive Rate). With an AUC value of 0.88, the
ROC curve illustrates that RES2U-Net has good performance for exudate segmentation
with the ability to accurately separate exudate and background.
Fig. 9. ROC curve using RES2U-Net for exudate segmentation on retinal image.
4. Discussion
This study has demonstrated the RES2U-NET architecture to segment exudates in retinal
images. The performance results used as a comparison are Accuracy (Acc), Sensitivity
(Sen), Specificity (Spe), Precision (Prec), F1-Score, G-Mean, Intersection over Union
(IoU), and Area Under the Curve (AUC). A comparison of the results of RES2U-NET and
other studies can be seen in Table 2 and Table 3. Table 2 and Table 3 contain the results of the study using the IDRID dataset for exudate segmentation.
Table 2, shows that the RES2U-NET on the IDRID dataset shows the highest performance results
in all aspects.
The accuracy obtained shows that RES2U-Net successfully classifies exudates and backgrounds
as excellent but is still dominant in the background area. The sensitivity in RES2U-Net
is an indication that it is well capable of exudate segmentation (foreground) in retinal
images, despite there still being those that are mistakingly recognized as background
(non-exudate). The specificity result shows that RES2U-Networks excellently in segmenting
background areas (non-exudate). The precision result obtained from this study shows
that the exudate prediction results are accurate but there are still some background
areas that are detected as exudate. The performance is used to measure the balance
of data in handling majority and minority data are F1-Score and G-Mean. F1-Score is
used to measure the balance between precision and recall. G-Mean is used to provide
an overall picture of the model’s performance in handling data balance. Other performances
are IoU and Area Under Curve (AUC). The F1-Score result obtained shows that the segmentation
architecture has a fairly balanced performance between precision and recall with relatively
low errors in recognizing target objects. The G-mean results approaching 1 indicate
that the model maintains stable performance without bias toward a specific class (object
or background). Table 3 shows that RES2U-Net achieves an IoU of 70%. This result indicates that the RES2U-Net
architecture is quite good at covering up to 70% of the ground truth area. The AUC
results confirm the ability of the proposed architecture to distinguish foreground
and background pixels. On the IDRID dataset, RES2U-NET achieves the highest F1-Score,
G-Mean, IoU, and AUC among similar studies. Although the sensitivity and IoU are still
in the good category, both are still the highest results compared to the results of
other studies [9], [27], [28], [29].
Table 2. The comparison of accuracy (Acc), sensitivity (Sn), specificity (Sp), and
precision (Prec) results in the study with other studies
|
Method
|
Acc (%)
|
Sn (%)
|
Sp (%)
|
Prec (%)
|
|
DL Tensorflow [27]
|
96.70
|
41.56
|
98.29
|
41.31
|
|
UNet [9]
|
-
|
67.13
|
-
|
-
|
|
DCNN [28]
|
95.93
|
71.19
|
98.32
|
-
|
|
SS-MAF [29]
|
-
|
69.39
|
-
|
-
|
|
RES2U-NET
|
98.86
|
76.91
|
99.56
|
84.70
|
Table 3. The comparison of F1-Score, G-Mean, IoU, and AUC results in the study with
other studies
|
Method
|
F1-Score
|
G-Mean
|
IoU
|
AUC
|
|
DL Tensorflow [27]
|
0.42
|
0.64
|
0.26
|
-
|
|
UNet [9]
|
0.80
|
-
|
0.67
|
-
|
|
DCNN [28]
|
-
|
0.84
|
-
|
-
|
|
SS-MAF [29]
|
0.73
|
-
|
0.57
|
0.86
|
|
RES2U-NET
|
0.81
|
0.88
|
0.7
|
0.88
|
The accuracy result above 90% indicates that RES2U-Net has excellent performance in
segmenting both the exudate area (foreground), but based on the sensitivity results
obtained, it shows that RES2U-Net works very well in the background area. These results
indicate the need for further research to improve the performance of RES2U-Net in
detecting exudates (foreground) in retinal images. The RES2U-Net architecture is capable
of segmenting hard and soft exudates in retinal images. The study provides an alternative
CNN architecture that can be developed as a model for building an automatic system
for early detection of retinal abnormalities.
5. Conclusion
Based on the study that has been carried out, the implementation of RES2U-Net provides
excellent performance results in exudate segmentation of retinal images. This can
be seen from the accuracy result of more than 90%, which shows that RES2U-Net can
perform optimal segmentation overall. The sensitivity, specificity, and precision
obtained show good ability in identifying background areas in the image and objects
that are predicted to be relevant. Apart from that, the model can separate the background
and foreground which can be seen in the F1-Score of 0.81. The F1-Score obtained shows
a good balance between precision and recall. The performances of G-Mean and AUC show
that RES2U-Net has a good ability to differentiate exudate and background. The sensitivity
shows the model’s ability to detect exudates quite well. Although RES2U-Net was successful
in identifying exudates and had a good balance in separating exudates from the background.
The IoU shows that the match between segmentation results and ground truth still needs
to be improved to overcome the overlap between ground truth and foreground (exudate)
results so that segmentation results become more valid and accurate.
Acknowledgment
The study/publication of this article was funded by DIPA of the Public Service Agency
of Universitas Sriwijaya 2024. No. 0098.122/UN9/SB3.LP2M.PT/20244, On November 21,
2023. Under the Rector’s Decree Number. 00l3/UN9/LP2M.PT/2024, On May 20, 2024.
References
P. Sarkar , O. Dewangan , A. Joshi , A review on applications of artificial
intelligence on bionic eye designing and functioning, Scandinavian Journal of Information
Systems, Vol. 35, No. 1, pp. 1119-1127, 2023

M. Kropp , O. Golubnitschaja , A. Mazurakova , L. Koklesova , N. Sargheini
, T.-T. K. S. Vo , E. de Clerck , J. Polivka , P. Potuznik , J. Polivka
, I. Stetkarova , P. Kubatke , G Thumann , Diabetic retinopathy as the leading
cause of blindness and early predictor of cascading complications—-risks and mitigation,
EPMA Journal, Vol. 14, No. 1, pp. 21-42, 2023

P. K. Jena , B. Khuntia , C. Palai , M. Nayak , T. K. Mishra , S. N. Mohanty
, A novel approach for diabetic retinopathy screening using asymmetric deep learning
features, Big Data and Cognitive Computing, Vol. 7, No. 1, 2023

S. Alqazzaz , X. Sun , X. Yang , L. Nokes , Automated brain tumor segmentation
on multi-modal MR image using SegNet, Computational Visual Media, Vol. 5, No. 2, pp.
209-219, 2019

A. Desiani , E. B. Suprihatin , D. Riana , M. Arhami , I. Ramayanti , Y.
Utama , Denoised non-local means with BDDU-Net architecture for robust retinal blood
vessel segmentation, International Journal of Pattern Recognition and Artificial Intelligence,
Vol. 37, No. 16, pp. 1-27, 2023

A. Desiani , Erwin , B. Suprihatin , F. Efriliyanti , M. Arhami , E. Setyaningsih
, VG-DropDNet a robust architecture for blood vessels segmentation on retinal image,
IEEE Access, Vol. 10, pp. 92067-92083, 2022

M. Hashemi , Enlarging smaller images before inputting into convolutional neural
network: zero-padding vs. interpolation, Journal of Big Data, Vol. 6, No. 1, pp. 1-13,
2019

A. Desiani , Erwin , B. Suprihatin , S. B. Agustina , A robust techniques
of enhancement and segmentation blood vessels in retinal image using deep learning,
Biomedical Engineering: Applications, Basis and Communications, Vol. 34, No. 4, pp.
2250019, 2022

Y. Xu , Z. Zhou , X. Li , N. Zhang , M. Zhang , P. Wei , FFU-Net: feature
fusion U-Net for lesion segmentation of diabetic retinopathy, Biomedical Research
International, pp. 1-12, 2021

L. Geng , H. Che , Z. Xiao , Y. Liu , Extracting retinal anatomy and pathological
structure using multiscale segmentation, Applied Sciences, Vol. 9, No. 18, 2019

F. Zabihollahy , A. Lochbihler , E. Ukwatta , Deep learning based approach
for fully automated detection and segmentation of hard exudate from retinal images,
Proceedings of SPIE, Vol. 13, No. 9, pp. 101420-1095308, 2019

R. Li , S. Zheng , C. Duan , C. Zhang , J. Su , P. M. Atkinson , Multi-attention-network
for semantic segmentation of fine resolution remote sensing images, IEEE Transactions
on Geoscience and Remote Sensing, Vol. 60, pp. 1-13, 2021

S.-H. Noh , Performance comparison of CNN models using gradient flow analysis,
Informatics, Vol. 8, No. 3, pp. 1-13, 2021

S. Targ , D. Almeida , K. Lyman , ResNet in ResNet: Generalizing residual architectures,
arXiv preprint arXiv:1603.08029, 2016

Z. Wu , T. Nagarajan , A. Kumar , S. Rennie , L S. Davis , K. Grauman
, R. Feris , BlockDrop: dynamic inference paths in residual networks, Proc. of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8817-8826,
2018

G. Singh , A. Mittal , N. Aggarwal , ResDNN: deep residual learning for natural
image denoising, IET Image Processing, Vol. 14, No. 11, pp. 2425-2434, 2020

P. Porwal , S. Pachade , M. Kokare , G. Deshmukh , IDRiD: diabetic retinopathy-segmentation
and grading challenge, Medical Image Analysis, Vol. 59, pp. 101561, 2020

A. Subhasree , J. B. Princess , Salaja , Analysis and automatic detection
of microaneurysms in diabetic retinopathy using transfer learning, Proc. of 6th International
Conference on Computation System and Information Technology for Sustainable Solutions
(CSITSS), pp. 1-7, 2022

M. Mateen , J. Wen , N. Nasrullah , S. Sun , S. Hayat , Exudate detection
for diabetic retinopathy using pretrained convolutional neural networks, Complexity,
2020

P. Porwal , S. Pachade , M. Kokare , G. Deshmukh , J. Son , Indian Diabetic
Retinopathy Image Dataset (IDRiD), 2020

K. Dharavath , F. A. Talukdar , R. H. Laskar , Improving face recognition rate
with image preprocessing, Indian Journal of Science and Technology, Vol. 7, No. 8,
pp. 1170-1175, 2014

A. Fauzi , L. E. Lubis , Optimization of retinal blood vessel segmentation based
on Gabor filters and particle swarm optimization, Indonesian Journal of Electrical
Engineering and Computer Science, Vol. 29, No. 3, pp. 1590-1596, 2023

D. A. Dharmawan , L. Listyalina , Retinal blood vessel segmentation as a tool
to detect diabetic retinopathy, Journal of Electrical Technology UMY, Vol. 3, No.
2, pp. 44-49, 2019

A. Desiani , Erwin , B. Suprihatin , S. Yahdin , A. I. Putri , F. R. Husein
, Bi-path architecture of CNN segmentation and classification method for cervical
cancer disorders based on Pap-smear images, IAENG International Journal of Computer
Science, Vol. 48, No. 3, pp. 782-791, 2021

Z. Fan , H. Lin , C. Li , J. Su , S. Bruno , G. Loprencipe , Use of parallel
ResNet for high-performance pavement crack detection and measurement, Sustainability,
Vol. 14, No. 3, pp. 1-21, 2022

M. Arhami , A. Desiani , S. Yahdin , A. Islamia , Contrast enhancement for
improved blood vessels retinal segmentation using top-hat transformation and Otsu
thresholding, International Journal of Advances in Intelligent Informatics, Vol. 8,
No. 2, pp. 210-223, 2022

A. Benzamin , C. Chakraborty , Detection of hard exudates in retinal fundus images
using deep learning, Proc. of International Conference on Informatics, Electronics
and Vision, pp. 465-469, 2018

S. Basu , S. Mukherjee , A. Bhattacharya , A. Sen , Segmentation of blood
vessels, optic disc localization, detection of exudates, and diabetic retinopathy
diagnosis from digital fundus images, Advances in Intelligent Systems and Computing,
pp. 173-184, 2020

J. Zhang , X. Chen , Z. Qiu , M. Yang , Y. Hu , J. Liu , Hard exudate
segmentation supplemented by super-resolution with multi-scale attention fusion module,
Proc. of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.
1375-1380, 2022

Anita Desiani is an associate professor at the Mathematics Department of Mathematics
and Natural Science Faculty, Universitas Sriwijaya. She received a mathematics bachelor
from Universitas Sriwijaya in 2000, a magister degree in computer science from Universitas
Gadjah Mada in 2003, and a doctoral degree in the Mathematics and science department
in 2022. Her current research interests include the fields of data mining, image processing,
pattern recognition, computer vision, and artificial intelligence.
Bambang Suprihatin received his bachelor’s degree of mathematics from Universitas
Sriwijaya, Indonesia, in 1994, and an M.Sc. degree in mathematics from the Bandung
Institute of Technology (ITB), Bandung, Indonesia, in 2002. He was an Associate Professor
in 2011. Since 2012. He received his Doctorate in Mathematics, at Universitas Gadjah
Mada (UGM) in 2016. His current research interests are statistics and modeling. He
has experience as an author of statistics and mathematics books.
Muhammad Suedarmin was born in Muara Bungo, 7 November 2000. He received a mathematics
bachelor’s degree from Universitas Sriwijaya in 2022. In 2019, he joined the Computer
Laboratory of the Faculty of Mathematics and Natural Sciences, Sriwijaya University
as an Assistant Lecturer. His current research includes the fields of image processing,
pattern recognition and computer vision, data mining, and artificial intelligence.
Siti Rusdiana Puspa Dewi is a lecturer in the Dentistry Study Program, at the Faculty
of Medicine, Universitas Sriwijaya. She graduated from North Sumatra University in
2004 as a dentist. In 2015, she continued her study as magister of biomedical science
at Universitas Sriwijaya. Now she has registered as a student of the Bioscience Doctoral
Program at the Faculty of Medicine, Universitas Sriwijaya.
Akmal Junaidi is an associate professor at the Department of Computer Science, Universitas
Lampung, Indonesia. He received his Ph.D. degree in computer science from TU Dortmund,
Germany in 2016, an M.Sc. degree in telematics from Universiteit Twente, The Netherlands
in 2003, and a B.Sc. in Mathematics from Universitas Sriwijaya, Indonesia in 1995.
His research interests include deep learning, image processing and analysis, pattern
recognition, natural language processing, and steganography.
Muhammad Arhami received a mathematics bachelor’s degree from Universitas Syahkuala
in 2000, and a magister’s degree in computer science from Universitas Gadjah Mada
in 2004. He is an associate professor in 2013. His research fields are artificial
intelligence, Mathematics, data mining, software engineering, and data structure.
He already has experience as the author of artificial intelligence, Expert System,
Matlab programming books, and mathematics books. In 2020.