Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 13, No. 04, p.313-321

ISSN (online) :

2287-5255

Received : 9 October 2023Revised : 19 December 2023Accepted : 2 January 2024

DOI :

https://doi.org/10.5573/IEIESPC.2024.13.4.313

Regular Paper

Drone Detection and Tracking using Deep Convolutional Neural Networks from Real-time CCTV Footage

AllmamunMd¹ AkterFahima¹ TalukdarMuhammad Borhan Uddin² ChakrabortySovon³ UddinJia⁴

(Department of Computer Science and Engineering, European University of Bangladesh, Dhaka, Bangladesh {mdallmamunridoy, fahimakalam354}@gmail.com )
(Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh borhan.talukdar4466@gmail.com )
(Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh sovon.chakraborty@ulab.edu.bd )
(AI and Big Data Department, Woosong University, Daejeon, Korea jia.uddin@wsu.ac.kr )

^*Corresponding Author: Jia Uddin

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Drones are flying objects that may be controlled remotely or programmed to do a wide range of tasks, including aerial photography, videography, surveys, crop and animal monitoring, search and rescue missions, package delivery, and military operations. Unrestrained use, however, can pose a significant threat to safety, privacy, and security through eavesdropping, flying close to prohibited locations, interfering with public events, and delivering illicit items. Hence, real-time drone detection and tracking are indispensable and appropriate measures. This study developed real-time drone detection and tracking using the most efficient deep-learning approaches. The models were fine-tuned first to suit the required purpose and yield the desired outcome. The performance of the developed system was better than that of earlier endeavors in terms of accuracy and loss. Of the seven fined-tuned models, the Xception model constantly rendered the maximum accuracy with negligible loss. The model outperformed other state-of-the-art architectures, exhibiting an accuracy and loss of 99.18% and 3.83, respectively.

Keywords

Drone detection, UAV, Object detection, Xception, OpenCV, Computer vision, Drone tracking, Deep learning

1. Introduction

An unmanned aerial vehicles (UAVs), often known as drones, are flying objects that can be controlled remotely and programmed to carry out various tasks. Drones can take photographs and videos and collect data from the air because they are equipped with cameras, sensors, and other technologies ^[1]. Aerial photography and videography, conducting surveys, animal and crop surveillance, rescue operations, food and product deliveries, and military activities are just a few of the tasks UAVs can do. UAVs are available in various sizes, from tiny quadcopters that can fly indoors to huge crewless aircraft employed in different military operations. Despite the rising popularity of these items, their misuse can pose a serious threat. For example, spying on people without their consent, flying near restricted areas (e.g., airports, military bases, and administrative buildings), interfering with public events (e.g., concerts, parades, and sporting events), and delivering illegal goods (e.g., narcotics or weapons). Drones have recently gained media attention for infiltrating high-security places and flying over prohibited regions. Gatwick Airport in the UK was closed for 36 hours in December 2018 after a drone was seen flying close by, delaying the travel plans of hundreds of travelers. In January 2021, a man was detained for flying a drone in a prohibited area near the White House [2A person was detained in March 2021 for using a drone to carry narcotics to a South Carolina jail ^[3]. In April 2022, gun dealers smuggled 11 handguns from the USA into Canada using a large drone ^[4].

Therefore, monitoring flying drones has become indispensable. On the other hand, various safety and security measures are already in place to forestall unlawful activities. For example, no-fly zones have been established by some drone manufacturing companies for different sensitive regions, such as airports, jails, and power plants, which prohibit drones from flying within a 25-kilometer radius ^[5].

On the other hand, no-fly zones have very limited influence, and not all drones are equipped with these built-in safety measures. Therefore, the development of real-time drone detection systems is rapidly advancing. Conventional drone detection systems are based on one of the following four approaches: radar, acoustic, visual, and radio frequency ^[6]. This research utilizes a fine-tuned, robust deep learning architecture to detect drones from images, video clips, and real-time videos using advanced computer vision approaches.

1.1 Detection of Drones using Acoustics

Detecting drones using acoustics is a technique that utilizes microphones to catch the nearby drone sound. The position, altitude, and movement direction of a drone can be estimated by identifying and interpreting its distinctive Although this technique is instrumental for detecting drones at night or in low-light situations, weather factors, including wind, rain, and snow, can affect the accuracy and range of the detection system. Ambient noise, traffic noise, and industrial noise can also alter the acoustic profile of a drone, making it more elusive to detect. The acoustic sensors can also be exposed to sounds produced by small animals, birds, and flying insects, contributing to false-positive detections. Moreover, drones must be in proximity to the detection system to be detected because of the restricted acoustic sensor range. On the other hand, the inability of acoustic sensors to offer details on the size, shape, or kind of drones may restrict their utility in identifying and locating drones.

1.2 Detection of Drones using Radar

Another popular method for locating and following drones in the air is by radar. Radars are tools that identify and detect items nearby using electromagnetic radiation. They send out a signal that bounces off nearby objects and returns to the radar, enabling it to determine the size, speed, and direction of an object. Finding drones requires radar systems to detect small, low-flying objects with radar cross-sections, often much smaller than aircraft. In addition, the detection range of a drone can be influenced by factors such as geography, weather, size, and speed. Miniature drones are particularly challenging to detect using radar owing to their narrow radar cross-sections and propensity for false positives. Once a drone is discovered, it might be difficult to establish its precise location, particularly if it keeps vacillating constantly.

1.3 Visualized Drone Detection

Visualized drone identification uses thermal or optical cameras to identify and track drones visually. This technique uses image analysis from photographs, videos, and webcams to locate and track drones. Visual drone identification may be highly accurate when combined with machine learning algorithms that can assess the images in real-time to identify drones. This technology is a flexible choice for drone detection because it can be employed in various settings and lighting situations. Visualized drone identification can also have a lower false-positive rate than other detection techniques owing to its ability to identify drones precisely based on their visual features. The proposed approach uses drone images, videos, webcams, and different computer vision algorithms to detect drones.

This research focused on the instant detection of drones from videos and images so that no harmful UAV can go unnoticed. The detection of UAVs instantly will minimize the risk in numerous aspects. This research is effective for the security systems of a country.

2. Literature Review

Ahmed et al. ^[7] developed a real-time drone detector using machine learning. They exploited the gray level co-occurrence matrix (GLCM) and accelerated the robust features (SURF) features by modifying the resolution structure of the input photographs and tweaking the size parameters of the anchor box. An anti-drone dataset tagged with a KCF tracker and a drone dataset from the University of Southern California were used to train the model. The model generated promising results in real-time detection at a reasonable system cost. Based on the footage taken using stationary cameras, Wang et al. ^[8] suggested a detection method for UAVs. Moving items were recognized using the temporal median background removal technique and global Fourier descriptors. The local histogram of oriented gradients (HOG) features was extracted from pictures of moving objects.

The SVM classifier conducted classification and recognition using the combined Fourier descriptor (FD) and HOG features. The authors demonstrated experimentally that the suggested FD and HOG algorithms could accurately categorize birds and drones better than the GFD algorithm. The accuracy of the proposed recognition technique was 98%. Peng et al. developed a sizable training set of 60,480 generated photographs to identify UAVs ^[9]. In the manually annotated UAV test set, the faster R-CNN network performed with an average accuracy of 80.69%, compared to 43.03% in the pre-trained common objects in context (COCO) 2014 dataset and 43.36% in the PASCAL visual object classes (VOC) 2012 dataset. Compared to previous techniques, the average accuracy of the faster R-CNN detection network was considerably greater when trained on rendered pictures. Singha et al. ^[10] developed an automated drone detection system using YOLOv4. The model was evaluated using the mean average precision (mAP), frames per second (FPS), precision, recall, and F1-score on drone and bird datasets. The results outperformed other research of a similar kind, with an F1-score of 0.79, an accuracy of 0.95, a recall of 0.68, and a mAP of 74.36%. Shakila et al. ^[11] suggested a model using YOLOV8 with an accuracy of 98.33% for drone detection and 97.5% for drone classification. The findings were achieved using a dataset of 10,000 images. The convolutional neural network (CNN) architecture was used for pre-processing, feature extraction, and classification as part of the authors' deep learning-based methodology. Hamatapa et al. ^[12] suggested two techniques for using motion detection and image processing to find and follow a UAV at a distance of 350 feet during the day. Four different drone models—the Phantom 4 Pro, AgrasMG-1s, Pocket DroneJY019, and MavicPro—as well as birds and balloons were used to evaluate the system. C. Aker et al. ^[13] modified and improved the single-stage YOLOv2 ^[14] method to distinguish drones from birds in videos and estimate their location. The researchers blended real drone and bird photographs with coastal video footage to develop a synthetic dataset. The proposed network was assessed using precision-recall (PR) curves, where the accuracy and recall levels reached 0.90. Magoulianitis et al. ^[15] pre-processed the pictures using the deep CNN with skip connection and network-in-network (DCSCN) super-resolution technique ^[16] before utilizing the Faster-RCNN detector. As a result, the detector could identify drones very far away, and its recall ability was improved. The task yielded recall and accuracy scores of 0.59 and 0.79, respectively.

Fine-tuned VGG16 ^[17], VGG19 ^[18], ResNet50 ^[19], InceptionV3 ^[21], Xception ^[22], ResNet101v2 ^[20], and MobileNetV2 ^[23] architectures were used in this study to detect drones. The outcomes were then scrutinized to render the optimal result. Photographs of drones were collected from Roboflow to train these deep-learning architectures. The mean squared error (MSE) was used to assess the model performance. The proposed approach performed significantly better with higher accuracy and negligible loss than earlier efforts.

3. Methodology

This study concentrated on detecting and tracking drones from images, video clips, and real-time videos. The methodology of this study is broken down into several subsections.

3.1 Dataset Preparation

Preparing the dataset is fundamental

A balanced and quality dataset enhances the likelihood of a model succeeding in its trained task. On the other hand, a poor, noisy, chaotic, and narrow dataset can lead to mediocre performance. This study collected a dataset containing drone images of various models and sizes captured from different angles. The dataset was outsourced to Roboflow. Fig. 1 presents a few samples from the dataset. Once the images were acquired, some pre-processing was required before they were sent to the model for training.

Fig. 1. Sample images from the dataset.

3.2 Image Pre-processing

Data seldom comes in perfect forms. Hence, a certain degree of pre-processing is necessary to ensure optimal outcomes. The dataset used in this study also underwent pre-processing wherever required. The original dataset comprised images with no annotation. The images are annotated first to make the model learn from the images. The annotation was accomplished using the Roboflow annotator tool, and annotated data were standardized in XML format. Fig. 2 presents the annotated images.

Fig. 2. Image instances after annotation.

Fig. 3. Sample of a pre-processed image of size 224×224.

Briefly, 8167 colored drone images and an annotator XML file were utilized to train the model. The dataset was well-balanced, and the images had a wide variety to ensure the model could learn the features properly. Another issue in the dataset that should be addressed is the necessity for rescaling. Although the images were of different sizes, rescaling to 224 ${\times}$ 224 was best suited to the present needs. Hence, the images were downsized to unified dimensions of 224 ${\times}$ 224. The pixels were manipulated using Eq. (1)

(1)

$ \left(w',h'\right)=\frac{M}{max\left(w,h\right)}\left(w,h\right) $

where w, h, w', h', and M are the old width, old height, new width, new height, and maximum value between the entire matrix, respectively.

Now that the images are in specified dimensions, it is not yet ready to feed the model. The CNNs demand numerical data as input, but there are only images. Hence, the image data should be transformed into matrices. The image processing toolbox in MATLAB contains a function called imread ^[16] that reads image files in the form of matrices. After converting the images to 3D tensors, the annotator values must be scaled to correspond to the 224${\times}$224 image size. The data are now prepared for training.

3.3 Candidate Models for Drone Detection and Tracking

The authors used seven fine-tuned deep-learning models for drone detection and tracking. The model with the optimal performance was chosen as the desired model. The examined models were as follows: VGG16 ^[17], VGG19 ^[18], ResNet50 ^[19], InceptionV3 ^[20], Xception ^[21], ResNet101v2 ^[22], and MobileNetV2 ^[23].

3.4 Drone Detection and Tracking

Once the model is trained by the pre-processed images, it can recognize drones in previously unseen images. In some machine learning frameworks, such as Keras and TensorFlow, the predict function, a method for making predictions on unseen data using a trained model, predicts a likely outcome. The rectangle function, a component of the OpenCV library, is now required to draw a rectangular box around the drone, localizing it in the image. Finally, the localization is apparent on the image, which is made possible using the matplotlib library. Fig. 4 presents the fundamental steps involved in detecting drones from images.

Fig. 4. Drone Detection Process from Images.

Detecting drones from videos requires they first be rescaled to 224 ${\times}$ 224. A video is composed of individual still pictures, also known as frames. As the videos are disintegrated into the constituent frames, the model is fed with frames to locate the drones. The method used afterward for localization is the same as before. Following the prediction, frames are accumulated to constitute videos that display the tracking of the drones. Fig. 5 shows the process of drone tracking from videos.

Fig. 5. Drone Tracking from Videos.

The model is also used for drone tracking from real-time videos. This is simulated by exploiting the webcam of a computer. The VideoCapture function of OpenCV is used to capture real-time videos. The acquired real-time video is split further into frames and fed to the model to render the desired outcome, similar to what is done in the case of tracking drones from video footage.

4. Experimental Result Analysis

4.1 Model Training

The dataset exploited in this study contained 8167 colored images outsourced to Roboflow. The split ratio of the dataset for training, validation, and testing was 0.7, 0.2, and 0.1, respectively. Seven fine-tuned object detection deep learning models were used to identify and track drones in photos, videos, and real-time video footage. The training is done using a Tesla T4 GPU in Google Colab. The batch size was set to 32 to run 100 epochs for each of the seven models. The Adam optimizer with a learning rate of 0.0001 was used for improved learning. One fully connected dense layer with a linear activation function was used in the output layer. The Mean Squared Error (MSE) was used to assess how the model performed. Table 1 shows how the VGG19 model performs. Higher epoch numbers meet with a significant improvement in the accuracy and loss. The optimal accuracy and loss rendered by the VGG19 model were 98.43% and 7.9067, respectively. The main focus was on the categorical cross-entropy loss to understand the result properly.

Table 1. Performance record of the VGG19 model.

Epoch No.	Accuracy	Loss
1	0.63809	2214.1086
9	0.86998	442.9437
37	0.94560	105.1398
91	0.97887	12.9733
98	0.98431	7.9067

The VGG16 model was trained after training the VGG19 model, which performed significantly poorer than VGG19. Hence, the performance breakdown with epoch counts is not shown. The VGG16 model gained an accuracy of 98.10% and a loss of 15.02. Table 2 lists the performance of MobileNetV2.

Table 2. Performance Record of the MobileNetV2 model.

Epoch No.	Accuracy	Loss
1	0.72304	1577.6967
4	0.93363	117.1033
16	0.94871	65.7381
31	0.97361	33.6515
38	0.97431	30.7491
40	0.98036	33.0461
51	0.98280	22.1523
60	0.98473	19.4641
75	0.98773	16.8436
92	0.98851	9.0970

The accuracy of the MobileNetV2 model was 98.851%, which is greater than that of the VGG19 and VGG16 models. The MobileNetV2 model loss was 9.0970, which was less than the VGG16 model but more than the VGG19 model.

The accuracy of the Resnet50 model was 98.51%, with a loss of 12.6993 after training (Table 3). The accuracy was lower than the former models, and the loss was higher. The Resnet101V2 object detection model was trained again to determine if it performed better. Despite this, the accuracy obtained was 98.75%, which was lower than ResNet50, and the loss was 5.5813. The InceptionV3 model was trained afterward, the performance of which is shown in Table 4.

Table 3. Performance Record of the ResNet50 Model.

Epoch No.	Accuracy	Loss
1	0.89663	314.7631
3	0.92109	238.1099
12	0.95441	98.2536
18	0.97256	58.9677
32	0.97563	30.5070
35	0.97703	34.8471
50	0.98431	25.5513
54	0.98457	16.3150
68	0.98510	12.6993

Table 4. Performance Record of InceptionV3 Model.

Epoch No.	Accuracy	Loss
3	0.90935	161.3831
5	0.93139	124.5265
9	0.95467	43.0933
12	0.97177	31.9648
21	0.97510	27.4512
35	0.98220	13.0324
69	0.98482	16.8574
70	0.98799	10.1683
71	0.98948	7.4961

The InceptionV3 model outperformed earlier models, showing an accuracy of 98.94% and a loss value of 7.4961. On the other hand, the loss must be minimized as much as possible. Therefore, the final object detection model, the Xception model, was trained to obtain the minimal loss value. Table 5 outlines how the Xception model performed given the anticipation. The table provides a detailed view of the result achieved after proper training and validation of the model.

Table 5. Performance Record of Xception Model.

Epoch No.	Accuracy	Loss
1	0.67806	2009.8859
2	0.81974	474.6008
3	0.88015	232.6906
4	0.89514	168.8375
5	0.90813	175.6783
6	0.93161	116.3893
9	0.96370	49.2952
28	0.97850	32.8853
32	0.97861	21.0095
65	0.98561	13.7156
78	0.98851	13.7388
81	0.99185	3.8355

Performance analysis of the Xception model showed that the model achieved the optimal accuracy of 99.185%, with the lowest loss achieved thus far (3.8355). Hence, of the seven deep learning models, the Xception model outperformed other models in terms of accuracy and loss.

4.2 Performance Comparison

Once the training was accomplished, the performance of each model was assessed, and the one with optimal performance in all regards was selected. Here, the models were compared in terms of accuracy, loss, trainable parameters, IoU, picture frame division time, and other significant factors.

Fig. 6 compares the performance of the models based on the accuracy, and Fig. 7 compares the performance according to the loss. The Xception model performed significantly better than any other model assessed.

Fig. 6. Comparison of the Accuracy.

Fig. 7. Comparison of Loss among the Models.

Fig. 8 compares the models based on the number of trainable parameters. It is a crucial measure because it indicates how computationally demanding a particular model is. The number of trainable parameters can greatly impact the performance when a model with a higher computational load runs on a device with low computational power.

MobileNetV2 poses the lowest computational load on the device of the seven models, while the Resnet101V2 was the most computationally demanding (Fig. 8).

Fig. 8. Comparison of the Trainable Parameters.

The models were compared regarding the Area of Union (AoU). The following formula has been used to determine the IoU of the models on 32 images:

(2)

$ \sigma =\frac{w}{z} $

where w, z, and ${\sigma}$ are the area of intersection, area of union, and IoU (Intersection over Union). Table 7 compares the seven models for the IOU.

MobileNetV2 posed the least computational load on the device out of the seven models examined, while the Resnet101V2 was the most computationally demanding (Fig. 8).

The Xception model had the highest inference time of 96.1 s, while MobilenetV2 had the lowest (21.3 s) (Table 7). The table also compares the models according to the number of floating-point operations per second in billions. Although MobileNetV2 has the lowest number of operations, the Xception model also maintains a reasonable figure of 16.6 billion.

Table 6. Comparing the Models for IoU.

Model Name	Above 75%	Below 75%
VGG16	17	15
VGG19	15	17
InceptionV3	19	13
Resnet50	15	17
Resnet101V2	22	10
MobileNetV2	3	29
Xception	27	5

Table 7. Comparing the Models according to Inference Time and FLOPs.

Model Name	Time of Inference in every step (in seconds)	Floating Point Operations Per Second (in billions)
VGG16	64.6	16
VGG19	87.4	19.5
ResNet50	49.1	17.2
InceptionV3	46.6	21
Xception	96.1	16.6
ResNet101V2	34.6	16.3
MobileNetV2	21.3	12

4.3 Testing Details

4.3.1 Testing Images

Different learning rates and activation functions were used to test the models. No significant changes in the result were observed.

This study focused on reducing the trainable parameters. The results suggested that the models were overfitted. Dropout was applied to ignore overfitting. Nevertheless, the images were tested after fixing all the training issues.

Images with various backgrounds were used for testing purposes. If the image contained any drones, a bounding box was displayed around each one. The red bounding box represents the anticipated bounding box values predicted by the models, while the green bounding box displays the actual bounding box values, as shown in Fig. 9.

Fig. 9. Test outcomes of Drone images.

4.3.2 Testing Videos

The efficiency of the model was also tested against some real-time webcam videos and some YouTube videos containing drones.

The drone was constantly tracked for real-time videos and video clips as it kept moving with the change in frames. For both types of videos addressed above, the videos were split into picture frames by setting the FPS value initially to 10. Table 8 lists the image frame time given frame numbers. Fig. 10 accounts for the individual localization of the drones in frames sequentially.

Fig. 10. Drone tracking from a video.

Table 8. Image Frame time.

Frame No.	Time
Frame 1	24ms
Frame 2	48ms
Frame 3	43ms
Frame 4	35ms
Frame 5	66ms

5. Conclusion and Future Works

Drones have become prevalent today because of their extensive usability. This, in turn, poses some threats regarding their unethical and unlawful use. Hence, drone detection is necessary to forestall unfortunate events and damage. On the other hand, drones can be challenging to detect at different elevations using conventional drone detection methods because of their small size, quick speed, and high altitude. This study assessed seven fine-tuned DL object detection models for drone detection and tracking to determine the most robust. After extensive training and testing, the Xception model yielded the best performance. Although all the models were fair candidates for detecting and tracking a single drone, the models performed poorer when many drones were present in the image. Future studies will focus on building advanced models to facilitate the detection and tracking of multiple drones.

ACKNOWLEDGMENTS

This research is funded by Woosong University Academic Research 2024.

REFERENCES

E. Kaufmann, ``Champion-Level Drone Racing using Deep Reinforcement Learning: Supplementary Data.'' Zenodo, 2023. doi: 10.5281/ZENODO.7955278.

A. Rejeb, K. Rejeb, S. J. Simske, and H. Treiblmaier, ``Drones for supply chain management and logistics: a review and research agenda,'' International Journal of Logistics Research and Applications, vol. 26, no. 6. Informa UK Limited, pp. 708-731, Sep. 24, 2021. doi: 10.1080/13675567.2021.1981273.

A. Sahebi-Fakhrabad, A. H. Sadeghi, E. Kemahlioglu-Ziya, R. Handfield, H. Tohidi, and I. Vasheghani-Farahani, ``The Impact of Opioid Prescribing Limits on Drug Usage in South Carolina: A Novel Geospatial and Time Series Data Analysis,'' Healthcare, vol. 11, no. 8. MDPI AG, p. 1132, Apr. 14, 2023. doi: 10.3390/healthcare11081132.

K. Flemons et al., ``The use of drones for the delivery of diagnostic test kits and medical supplies to remote First Nations communities during Covid-19,'' American Journal of Infection Control, vol. 50, no. 8. Elsevier BV, pp. 849-856, Aug. 2022. doi: 10.1016/j.ajic.2022.03.004.

F. Alobaid, N. Mertens, R. Starkloff, T. Lanz, C. Heinze, and B. Epple, ``Progress in dynamic simulation of thermal power plants,'' Progress in Energy and Combustion Science, vol. 59. Elsevier BV, pp. 79-162, Mar. 2017. doi: 10.1016/j.pecs.2016.11.001.

H. Eskandaripour and E. Boldsaikhan, ``Last-Mile Drone Delivery: Past, Present, and Future,'' Drones, vol. 7, no. 2. MDPI AG, p. 77, Jan. 21, 2023. doi: 10.3390/drones7020077.

Ahmed, T., Rahman, T., Roy, B. B., & Uddin, J. (2021). Drone Detection by Neural Network Using GLCM and SURF. Journal of Information Systems and Telecommunication, 9(33), 15-24.

H. Arksey and L. O’Malley, ``Scoping studies: towards a methodological framework,'' International Journal of Social Research Methodology, vol. 8, no. 1. Informa UK Limited, pp. 19-32, Feb. 2005. doi: 10.1080/1364557032000119616.

K. Peng et al., ``A Hybrid Genetic Algorithm on Routing and Scheduling for Vehicle-Assisted Multi-Drone Parcel Delivery,'' IEEE Access, vol. 7. Institute of Electrical and Electronics Engineers (IEEE), pp. 49191-49200, 2019. doi: 10.1109/access.2019.2910134.

S. Singha and B. Aydin, ``Automated Drone Detection Using YOLOv4,'' Drones, vol. 5, no. 3. MDPI AG, p. 95, Sep. 11, 2021. doi: 10.3390/ drones5030095.

Rahman, S., Rony, J. H., Uddin, J., & Samad, M. A. (2023). Real-Time Obstacle Detection with YOLOv8 in a WSN Using UAV Aerial Photography. Journal of Imaging, 9(10), 216.

R. Hamatapa and C. Vongchumyen, ``Image Processing for Drones Detection,'' 2019 5th International Conference on Engineering, Applied Sciences and Technology (ICEAST). IEEE, Jul. 2019. doi: 10.1109/iceast.2019.8802578.

C. Aker and S. Kalkan, ``Using deep networks for drone detection,'' 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, Aug. 2017. doi: 10.1109/avss.2017.8078539.

M. J. Shafiee, B. Chywl, F. Li, and A. Wong, ``Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video.'' arXiv, 2017. doi: 10.48550/ARXIV.1709.05943.

V. Magoulianitis, D. Ataloglou, A. Dimou, D. Zarpalas, and P. Daras, ``Does Deep Super-Resolution Enhance UAV Detection?,'' 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, Sep. 2019. doi: 10.1109/avss.2019.8909865.

C. Dong, C. C. Loy, and X. Tang, ``Accelerating the Super-Resolution Convolutional Neural Network,'' Computer Vision - ECCV 2016. Springer International Publishing, pp. 391-407, 2016. doi: 10.1007/978-3-319-46475-6_25.

``Correction: SN Computer Science,'' SN Computer Science, vol. 4, no. 6. Springer Science and Business Media LLC, Sep. 28, 2023. doi: 10.1007/s42979-023-02168-3.

A. Farjana, F. Tabassum Liza, M. Al Mamun, M. C. Das, and M. Maruf Hasan, ``SARS CovidAID: Automatic detection of SARS CoV-19 cases from CT scan images with pretrained transfer learning model (VGG19, RESNet50 and DenseNet169) architecture,'' 2023 International Conference on Smart Applications, Communications and Networking (SmartNets). IEEE, Jul. 25, 2023. doi: 10.1109/smartnets58706.2023.10216235.

D. Wu, Y. Ying, M. Zhou, J. Pan, and D. Cui, ``Improved ResNet-50 deep learning algorithm for identifying chicken gender,'' Computers and Electronics in Agriculture, vol. 205. Elsevier BV, p. 107622, Feb. 2023. doi: 10.1016/j.compag.2023.107622.

A. Dutta, S. Mitra, M. Basak, and T. Banerjee, ``A comprehensive review on batteries and supercapacitors: Development and challenges since their inception,'' Energy Storage, vol. 5, no. 1. Wiley, Apr. 06, 2022. doi: 10.1002/est2.339.

A. Mehmood, Y. Gulzar, Q. M. Ilyas, A. Jabbari, M. Ahmad, and S. Iqbal, ``SBXception: A Shallower and Broader Xception Architecture for Efficient Classification of Skin Lesions,'' Cancers, vol. 15, no. 14. MDPI AG, p. 3604, Jul. 13, 2023. doi: 10.3390/cancers15143604.

M. Bruegel, D. Nagel, M. Funk, P. Fuhrmann, J. Zander, and D. Teupser, ``Comparison of five automated hematology analyzers in a university hospital setting: Abbott Cell-Dyn Sapphire, Beckman Coulter DxH 800, Siemens Advia 2120i, Sysmex XE-5000, and Sysmex XN-2000,'' Clinical Chemistry and Laboratory Medicine (CCLM), vol. 53, no. 7. Walter de Gruyter GmbH, Jan. 01, 2015. doi: 10.1515/cclm-2014-0945.

Md. Allmamun

Md. Allmamun received his B.Sc. in Computer Science and Technology from Weifang University of Science and Technology. His research interest includes Computer vision and AI.

Fahima Akter

Fahima Akter is pursuing her B.Sc. in Computer Science and Engineering at the European University of Bangladesh. Her dedication and commitment to this field are evident in her continuous efforts to explore innovative solutions and technologies that can benefit the armed forces. Fahima is also a technology entrepreneur who leads an AI company that endeavors to take the AI industry to the next level.

Muhammad Borhan Uddin Talukdar

Muhammad Borhan Uddin Talukdar is a Senior Lecturer at the Department of CSE, European University of Bangladesh. In his spare time, he dedicates himself to writing about various issues and ideas. His research area includes Computer Vision, AI in Drug Design, Life Science Informatics, and Natural Language Processing.

Sovon Chakraborty

Sovon Chakraborty is a Lecturer at the University of Liberal Arts Bangladesh. He completed his M.Sc. from BRAC University. He is also a Professioal member of IEEE. His research includes Deep learning, aspect-based sentiment analysis, IOT, and Industrial fault diagnosis.

Jia Uddin

Jia Uddin is an Assistant Professor at Woosong University, South Korea. Before that, he was an associate professor at BRAC University. He completed his Ph.D. from the University of Ulsan. His research interests are Signal Processing, Industrial Fault Diagnosis, and Computer Vision. He has served as the Chair and a reviewer of numerous renowned Conferences and journals.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

Drone Detection and Tracking using Deep Convolutional Neural Networks from Real-time CCTV Footage

Abstract

Keywords

1. Introduction

1.1 Detection of Drones using Acoustics

1.2 Detection of Drones using Radar

1.3 Visualized Drone Detection

2. Literature Review

3. Methodology

3.1 Dataset Preparation

Fig. 1. Sample images from the dataset.

3.2 Image Pre-processing

Fig. 2. Image instances after annotation.

Fig. 3. Sample of a pre-processed image of size 224×224.

(1)

3.3 Candidate Models for Drone Detection and Tracking

3.4 Drone Detection and Tracking

Fig. 4. Drone Detection Process from Images.

Fig. 5. Drone Tracking from Videos.

4. Experimental Result Analysis

4.1 Model Training

Table 1. Performance record of the VGG19 model.

Table 2. Performance Record of the MobileNetV2 model.

Table 3. Performance Record of the ResNet50 Model.

Table 4. Performance Record of InceptionV3 Model.

Table 5. Performance Record of Xception Model.

4.2 Performance Comparison

Fig. 6. Comparison of the Accuracy.

Fig. 7. Comparison of Loss among the Models.

Fig. 8. Comparison of the Trainable Parameters.

(2)

Table 6. Comparing the Models for IoU.

Table 7. Comparing the Models according to Inference Time and FLOPs.

4.3 Testing Details

4.3.1 Testing Images

Fig. 9. Test outcomes of Drone images.

4.3.2 Testing Videos

Fig. 10. Drone tracking from a video.

Table 8. Image Frame time.

5. Conclusion and Future Works

ACKNOWLEDGMENTS

REFERENCES

Md. Allmamun

Fahima Akter

Muhammad Borhan Uddin Talukdar

Sovon Chakraborty

Jia Uddin

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing