Mobile QR Code QR CODE

2025

Reject Ratio

81.5%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 15, No. 3, p.410-425

ISSN (online) :

2287-5255

Received : 25 September 2025Revised : 12 November 2025Accepted : 14 January 2026

DOI :

10.5573/IEIESPC.2026.15.3.410

Review / Suvey Paper

A Survey of Deep Learning-Based Network Anomaly Detection: Benchmarking on NSL-KDD, UNSW-NB15, and CICIDS2017

(Seoyeon Choi) ¹ (Songhye Kim) ¹ (Jihyeon Ryu) ^1,^*

(Kwangwoon University, 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Korea)

^*Corresponding Author : Jihyeon Ryu, jhryu@kw.ac.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Network anomaly detection is a critical component of cybersecurity, enabling the identification of potential infringements and malicious traffic. In recent years, machine learning and deep learning methods have been widely applied for this purpose. This study systematically reviews research employing NSL-KDD, UNSW-NB15, and CICIDS2017 datasets and compares the performance of commonly used models. The analysis shows that convolutional neural networks (CNNs) achieve strong results on imbalanced, high-dimensional data by leveraging regional patterns and hierarchical feature learning, while long short-term memory (LSTM) networks excel in capturing temporal dependencies. Generative adversarial networks (GANs) further enhance detection performance by addressing data imbalance and producing realistic attack samples. However, CNNs struggle with long-term dependencies, LSTMs incur high computational costs for long sequences, and GANs face instability and mode collapse. To address these limitations, emerging approaches such as transformers, contrastive learning, and LLM-based multimodal frameworks are gaining attention. This paper highlights the strengths and weaknesses of CNNs, LSTMs, and GANs and outlines promising directions for next-generation network anomaly detection.

Keywords

Time series, Anomaly detection, Deep learning, Computer network

1. Introduction

In today’s digital environment, the Internet has become indispensable for data sharing and communication. However, cyber threats continue to grow in both sophistication and diversity. Attackers frequently target digital assets to steal sensitive information, underscoring the need for robust network security mechanisms capable of protecting both data and core infrastructure. Network anomaly detection, which aims to identify abnormal traffic patterns indicative of security breaches or malicious activity, is therefore a critical component of modern cybersecurity strategies. Despite its importance, anomaly detection faces several challenges. First, detection models often become outdated as new types of attacks rapidly evolve. Second, the ever-increasing volume of network traffic complicates the real-time analysis of large-scale datasets. Third, the overwhelming dominance of normal traffic relative to attack traffic leads to severe class imbalance, which significantly limits detection accuracy. Collectively, these challenges highlight the limitations of traditional detection techniques and motivate the adoption of more advanced machine learning (ML) and deep learning (DL) approaches ^[1]. Previous studies have developed intrusion detection systems using representative benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017, initially applying methods such as SVM, kNN, logistic regression, and decision trees. However, these datasets exhibit limitations, as their traffic characteristics and attack patterns no longer sufficiently reflect the complexity of modern network environments due to the period in which they were generated ^[2]. To address this gap, the present study additionally investigates recently collected real-world time-series network datasets such as NF-UQ-NIDS-V2 and CESNET-TimeSeries24. NF-UQ-NIDS-V2 reflects contemporary protocol configurations and incorporates diverse attack scenarios, while CESNET-TimeSeries24 consists primarily of normal traffic and enables the analysis of anomaly detection from a long-term temporal dependency perspective. In early network anomaly detection research, traditional ML models such as SVM, kNN, logistic regression, and decision trees were predominantly utilized. However, these approaches demonstrated limited performance in high-dimensional network environments and under severe class imbalance, resulting in high false positive rates and low recall. To alleviate these challenges, DL-based methods such as CNN, LSTM, and GAN have been introduced, achieving improvements in local pattern extraction, temporal dependency modeling, and data sparsity reduction. Nevertheless, CNN-based models remain constrained in capturing long-range sequential dependencies, LSTM-based models incur high computational overhead when processing long sequences, and GAN-based methods often suffer from training instability and mode collapse. These limitations have motivated the exploration of more advanced architectures, such as Transformer-based models, contrastive representation learning, and large-language-model (LLM)-driven multimodal anomaly detection. The main contributions of this study are as follows:

First, we provide a systematic review of existing anomaly detection research centered on widely used benchmark datasets, including NSL-KDD, UNSW-NB15, and CICIDS2017.
Second, instead of merely cataloging CNN-, LSTM-, and GAN-based models by architecture and performance, we analyze the relationship between model characteristics and real-world network attack patterns, clarifying the strengths and limitations inherent to each approach.
Third, we discuss emerging research trends, including Transformer-based sequence learning, contrastive learning for robust representation under class imbalance, and LLM-based multimodal learning, and we outline promising future directions for next-generation network anomaly detection.

2. Related Work

Traditional intrusion detection systems relied mainly on signature-based or statistical approaches, which required predefined rules and exhibited poor adaptability to emerging and unknown attacks. To address these limitations, machine learning and deep learning techniques have been introduced for network anomaly detection, particularly focusing on temporal dependencies and multivariate correlations in network traffic.

Early research efforts explored autoencoder-based and recurrent structures for general time series anomaly detection. Yu et al. ^[3] proposed a filter-augmented autoencoder with learnable normalization to improve the robustness of multivariate anomaly detection. Yu et al. ^[4] introduced DTAAD, a dual TCN attention architecture that captures both local and global temporal dependencies. Liu et al. ^[5] proposed an adversarial reconstruction framework for unsupervised anomaly detection, demonstrating improved detection accuracy through GAN-based learning. Similarly, Munir et al. ^[6] presented DeepAnT, a predictive CNN model that detects anomalies by forecasting normal behavior and computing deviation errors. These approaches achieved notable improvements in reconstructive and predictive accuracy; however, these methods were mainly evaluated on general industrial and sensor time-series datasets, with limited evaluation on network traffic corpora.

To improve interpretability and temporal modeling, Marino et al. ^[7] proposed the Network Transformer, a self-supervised and interpretable anomaly detection model for industrial control system traffic. Leveraging the self-attention mechanism, NeT simultaneously captures long-range temporal dependencies and inter-feature correlations while enabling explainable decision-making. Xu et al. ^[8] further introduced TGAN-AD, a hybrid Transformer–GAN model, combining the sequence modeling strength of Transformers with GAN-based data reconstruction. These studies demonstrate that attention-based architectures can outperform conventional CNNs and LSTMs in handling heterogeneous and unlabeled data environments.

Other works attempted to generalize deep learning frameworks for foundation-level modeling and hybrid training. González et al. ^[9] proposed Foundation Autoencoders to establish a unified representation model for multivariate time-series anomaly detection, while Golchin and Rekabdar ^[10] combined reinforcement learning, variational autoencoders, and active learning for adaptive detection under dynamic data conditions. These approaches highlight the movement toward self-adaptive and reinforcement-driven anomaly detection models.

In contrast, ML-based intrusion detection for industrial and network data was explored by Anton et al. ^[11, ^12]. Their studies used SVM and Random Forest classifiers to detect anomalies in Modbus and OPC UA traffic within industrial control networks. The results demonstrated that statistical feature-based learning could effectively identify abnormal behaviors but struggled to capture temporal dependencies and contextual relationships between packets.

Finally, Tscharke et al. ^[13] introduced a quantum autoencoder model designed for multivariate time-series anomaly detection, presenting a quantum inspired approach aimed at improving scalability and computational efficiency when processing large volumes of traffic data. The proposed method showed promising results for high dimensional enterprise telemetry similar to SAP system environments, yet its effectiveness on standard network intrusion detection datasets has not been fully examined.

Table 1 summarizes studies using these time-series datasets. The reviewed studies were categorized according to datasets, domains, input forms, models (backbone architectures), preprocessing methods, learning methods, extracted time-series features, and evaluation indicators.

Table 1. Categorization of time-aeries anomaly detection studies.

Category	Reference title	Dataset	Domain	Model (backbone)	Preprocessing	Learning	Evaluation metrics
Deep Learning	A filter-augmented auto-encoder with learnable normalization for robust multivariate time series anomaly detection ^[3]	SWaT, SMD, PSM, MSL, SMAP	ICS, Server, Satellite	NormFAAE (Auto-Encoder)	Learnable normalization; sliding window	Unsupervised	AUC, F-score, PA%K
	DTAAD: Dual TCN-attention networks for anomaly detection in multivariate time series data ^[4]	MSDS, SMD, WADI, SWaT, MSL, SMAP, MBA, UCR, NAB	ICS, Healthcare, General TS	Transformer (TCN+Attention)	Min–max normalization; noise injection; sliding window; POT threshold	Unsupervised	AUC, F1-score
	Time series anomaly detection with adversarial reconstruction networks ^[5]	MIT-BIH ECG, SWaT	Healthcare, ICS	BeatGAN (Auto-Encoder+GAN)	R-peak segmentation; length standardization; normalization	Unsupervised	AUC, F1-score
	DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series ^[6]	Yahoo, Ionosphere	Web traffic, Signal	CNN	Sliding-window transform; normalization	Unsupervised	F1-score, Precision, Recall
	Self-Supervised and Interpretable Anomaly Detection Using Network Transformers ^[7]	INL ICS PCAP	ICS	Transformer (NeT)	Packet parsing; rolling window; statistical feature extraction	Self-supervised	FPR, ADR
	TGAN-AD: Transformer-based GAN for anomaly detection of time series data ^[8]	Yahoo / general TS	General TS domains	Transformer + GAN	Contextual feature extraction; normalization	Unsupervised	Strong benchmark performance
	Towards Foundation Auto-Encoders for Time-Series Anomaly Detection ^[9]	Mobile ISP operational logs, KDD21 anomaly data	General TS	VAE + DCNN	Normalization pretrain	Unsupervised	Zero-shot anomaly detection
	Anomaly Detection in Time Series Data Using Reinforcement Learning, Variational Autoencoder, and Active Learning ^[10]	Yahoo, Data centers, sensor networks, finance	General TS	RLVAL	Normalization + sampling	Semi-supervised	Improved detection with few labels
Machine Learning	Anomaly-based Intrusion Detection in Industrial Data with SVM and Random Forests ^[11]	Modbus, OPC UA	ICS/OT	SVM, Random Forest	Feature selection (PCA); interpolation; missing values	Supervised	Accuracy, Precision, Recall, F1-score
Statistical / Probabilistic	Time is of the Essence: ML-based Intrusion Detection in Industrial Time Series Data ^[12]	Modbus/TCP ICS Simulation	ICS	MatrixProfile, SARIMA (+LSTM comparison)	One-hot encoding; preliminary feature selection	Unsupervised	Accuracy, F1-score
Statistical / Probabilistic	Quantum Autoencoder for Multivariate Time Series Anomaly Detection ^[13]	Standard TS benchmarks	General TS	Quantum Autoencoder	Normalization	Unsupervised	Outperforms classical autoencoders

Table 2. Standardized comparison of representative time-series anomaly detection models across benchmark datasets. Best values for each dataset are highlighted in bold.

Model (dataset)	AUC	F1	Precision	Recall	Accuracy	PA%K	FPR	ADR	Observations
NormFAAE (Avg) ^[3]	0.7059	0.5261	–	–	–	0.4927	–	–	–
DTAAD (NAB) ^[4]	0.9330	0.9057	–	–	–	–	–	–	–
BeatGAN ^[5]	0.9215$\pm$0.0003	0.7841$\pm$0.0009	–	–	–	–	–	–	–
DeepAnT (Traffic) ^[6]	–	0.87	0.50	0.004	–	–	–	–	–
NeT-Transformer ^[7]	–	–	–	–	–	–	0.0907	0.8471	–
TGAN-AD (SWaT) ^[8]	0.896	0.953	0.918	0.99	–	–	–	–	–
VAE (KDD2021) ^[9]	–	–	–	–	–	–	–	–	Tracking, zero-shot generalization, latent structure
RLVAL (Yahoo) ^[10]	–	0.921	0.894	0.950	–	–	–	–	–
SVM (DS1) ^[11]	–	0.852	0.782	0.936	0.925	–	–	–	–
RF (DS1) ^[11]	–	–	–	–	0.9984	–	–	–	–
SARIMA+LSTM (Industrial TS) ^[12]	–	0.90	–	–	0.985	–	–	–	–
QAE ^[13]	–	–	–	–	0.74	–	–	–	–

3. Background

3.1. Time Series

A time series is defined as a sequence of data points recorded at regular intervals. Depending on the observation method, time series can be categorized as discrete or continuous. Unlike ordinary data, time-series observations exhibit autocorrelation, meaning past values are statistically related to present ones; thus, it is inappropriate to assume independence between time points. Common examples include stock prices, temperature records, and network traffic, where previous observations directly influence subsequent fluctuations. To facilitate modeling, a time series is typically decomposed into its intrinsic components. These include a trend, which captures long-term directional changes; seasonality, which reflects recurring periodic patterns; and noise, which represents irregular or random fluctuations. Autocorrelation primarily arises from the trend and seasonal components, whereas noise is treated as the independent part. By removing structural dependencies through decomposition, time-series modeling becomes more robust and stable ^[14].

3.2. Type of Time series

3.2.1 Univariate time series (UTS)

A univariate time series (UTS) refers to a sequence of observations of a single variable recorded over time. Examples include hourly temperature measurements, daily precipitation levels in a given region, or fluctuations in stock closing prices. Formally, a UTS can be defined as follow:

(1)

$ X = (x_1, x_2, \dots, x_t), \quad x_i \in \mathbb{R}, \ i \in T, $

where $x_i$ represents the observations at time point $i$, and $T = \{1, 2, \dots, t\}$ represents the entire time interval. Because a UTS includes only a single variable, its structure is relatively simple; however, the data may still contain trends, seasonality, and noise. UTS analysis is applied in various domains, including forecasting, anomaly detection, and signal processing. Traditional statistical methods such as autoregressive integrated moving average and exponential smoothing have been widely used. More recently, deep learning techniques such as LSTM and transformer-based models have enabled accurate modeling of long-term dependencies and nonlinear patterns, even in single-variable data. Therefore, UTS is considered the most basic and important subject in time-series analysis ^[15, ^16].

3.2.2 Multivariate Time Series (MTS)

A multivariate time series (MTS) consists of multiple variables recorded simultaneously over time. Each variable is influenced not only by its own historical values (time dependence) but also by its relationships with other variables, often referred to as spatial or inter-variable dependence. For instance, recording air pressure, temperature, and humidity every hour in a given region represents a typical MTS. Formally, an MTS can be expressed as

(2)

$ X = \{x^1, x^2, \dots, x^M\}, \quad x^m \in \mathbb{R}^T, $

where $M$ denotes the number of variables and $x^m$ is the $T$-dimensional vector of the $m$-th variable. Alternatively, an MTS can be expressed as

(3)

$ X = \{x_1, x_2, \dots, x_T\}, \quad x_t \in \mathbb{R}^M. $

This can be represented as, where $x_t$ has $M$ features at time $t$. In other words, a multivariate time series can be understood as a sequence of vectors recorded along the time axis. Compared to univariate time series, MTSs are more complex because they simultaneously account for inter-variable correlations and temporal dependencies ^[17].

3.3. Time-Series Anomaly Detection (TSAD)

Previously, we defined the various types of anomalies that can occur in time -series data. We now consider methods designed to effectively detect such anomalies. Time-series anomaly detection (TSAD) aims to distinguish between normal and abnormal patterns by considering the distributional characteristics of the data, temporal dependencies, and complex multivariate relationships. In previous studies, detection approaches were generally categorized into statistical, machine learning-based, and deep learning-based methods, each offering distinct advantages depending on the nature of the data and analysis objectives ^[14, ^18].

3.3.1 Statistical-based models

Statistics-based anomaly detection techniques use the distributional characteristics of data to distinguish between normal and abnormal observations. These models can be broadly divided into parametric and nonparametric categories. The former assumes that the data follow a specific distribution, whereas the latter makes no prior distributional assumptions ^[19].

1. Parametric models A parametric model assumes that data are generated from a known probability distribution and operates by estimating the parameters of that distribution from training data. This approach has the advantage that the model complexity remains constant regardless of data size, making it suitable for large datasets, and verification can be performed efficiently. However, the most important factor in this type of model is the assumption of the correct distribution, which requires prior knowledge of the data distribution. Detection performance may degrade significantly if the actual data deviate from this assumption. Typically, a Gaussian distribution is assumed for continuous data, with mean and covariance estimated through maximum likelihood estimation (MLE). For categorical data, a multinomial distribution is often used, and for sequential data, a Markov model is generally considered suitable. In practice, however, many datasets exhibit complex structures that cannot be explained by a single distribution. In such cases, mixture models are applied. A representative example is the Gaussian mixture model (GMM), which combines several Gaussian distributions to more precisely estimate the underlying data distribution.

2. Non-parametric model A non-parametric model estimates the distribution directly from observed data without assuming that the data follow a specific distribution. This approach has the advantages of greater flexibility, higher autonomy, and effectiveness in capturing complex data structures because it is not constrained by prior distributional assumptions. However, as the size and dimensionality of the data increase, computational costs increase rapidly, and prior knowledge of distance-based similarity measures may be required. Representative techniques include histogram analysis, kernel density estimation (KDE), and the Parzen Window method. Although nonparametric models are highly flexible, they face challenges in computational efficiency when processing high-dimensional or large-scale datasets.

3.3.2 Machine learning models

1. Unsupervised learning Unsupervised learning is applied when only input data are available and no output or labels are provided. The model identifies the inherent structure of the data and extracts hidden patterns . This method has the advantage of automatically detecting new types of abnormal behavior without requiring labeled anomalies. Therefore, it is often used to build a collection of abnormal behaviors through system state monitoring or past time-series analysis and can subsequently be extended to the training data of supervised learning models.

2. Supervised learning Supervised learning is used when both input data and corresponding output labels are available. Normal and abnormal cases are trained on clearly labeled datasets, enabling the model to learn the boundaries between them and classify new input data. This approach generally achieves high accuracy and is particularly effective in detecting precursors of anomalies that appear before specific events. However, its application is limited because it requires a sufficient amount of labeled abnormal data.

3. Semi-supervised learning Semi-supervised learning is applied when only normal data are labeled and provided. After the model learns normal patterns, it identifies data that deviate from these patterns as abnormal. This approach is the most widely used time series anomaly detection in the literature and is suitable for realistic scenarios where normal data are abundant but abnormal data are scarce. Although often grouped with unsupervised learning, it is distinguished by its reliance on prior knowledge of normal data.

4. Reinforcement learning Reinforcement learning is a method in which a system interacts with a dynamic environment under a reward-and-punishment mechanism and learns the optimal behavioral strategy. In this process, goal achievement is evaluated using a value function, and the model gradually learns the optimal policy based on this feedback. Unlike traditional supervised or unsupervised learning, reinforcement learning improves performance through continuous interaction with the environment. Time-series data are mainly applied in dynamic decision-making problems such as autonomous control, resource management, and learning response strategies in the presence of anomalies ^[20].

3.3.3 Deep learning models

Deep-learning-based time-series anomaly detection techniques aim to learn normal patterns and then identify data that deviate from those patterns as anomalies. This approach is generally divided into prediction-based, forward-based, and reconstruction-based techniques. More recently, encoding- and distance-based methods have also been proposed, expanding the scope of application. Prediction-based techniques forecast future values from past data and use the difference between predicted and actual values as an index to determine abnormality. In contrast, reconstruction-based techniques compress normal data into a latent space and attempt to restore it, detecting anomalies when the reconstruction error is large. Encoding-based techniques convert data into low-dimensional representations and calculate anomaly scores directly from the latent representation without a restoration process. Distance-based techniques compute the similarity or distance between data points and identify those far from the normal range as anomalies. These methods differ in several aspects, including learning paradigm (supervised, unsupervised, semi-supervised, or self-supervised), anomaly score calculation (prediction error, reconstruction error, latent representation, or distance), and data representation (sliding window, space-time graph, or embedding). Each approach has distinct advantages and disadvantages. Therefore, deep learning-based time-series anomaly detection is not limited to a single approach; instead, models are selected and applied according to the data characteristics and application environment ^[21].

4. Datasets for Time-Series NIDS

Representative public datasets are widely used for the performance evaluation of network intrusion detection systems (NIDS).

4.1. UNSW-NB

Fig. 1. Correlation matrix between features and the binary class label in the UNSW-NB15 dataset.

The UNSW-NB15 dataset is a benchmark dataset for intrusion detection research, created in 2015 at the University of New South Wales (UNSW) Cyber Range Lab in Australia to reflect real-world network conditions. Using the IXIA PerfectStorm tool, raw network traffic consisting of both normal activity and a variety of contemporary attack types was generated and subsequently processed using Argus and Bro-IDS to extract 49 network flow, content-based, and time-based features. Correlation analysis revealed that attack traffic exhibits distinctive temporal and session-level behavioral patterns. In particular, features such as $sttl$, $state$, $ct\_dst\_sport\_ltm$, $ct\_src\_dport\_ltm$, $rate$, and $ct\_state\_ttl$ demonstrated strong positive correlations with attack classes. This aligns with the observation that malicious traffic typically involves frequent repetitive connection attempts within short time intervals, abnormal session termination states characteristic of scanning or DoS activities, and persistent unusual Time-to-Live (TTL) patterns. Accordingly, these features serve as meaningful indicators for distinguishing attack behavior. Conversely, features such as $proto\_freq$, $id$, $swin$, $dwin$, $dload$, $stcpb$, $dtcpb$, $tcprtt$, and $synack$ exhibited negative correlations with normal traffic. Normal network communications generally display stable and consistent values for parameters such as TCP window size, data transfer volume, and round-trip time (RTT), whereas these values tend to fluctuate irregularly during attack events. Thus, normal traffic is characterized by session stability, while attack traffic manifests abnormal patterns in connection initiation and session maintenance ^[22].

4.2. NSL-KDD

Fig. 2. Correlation matrix between features and the binary class label in the NSL-KDD dataset.

The NSL-KDD dataset is a widely used benchmark in intrusion detection system (IDS) research. It was reconstructed from the KDD’99 dataset to address the issue of redundant samples and to ensure a more balanced distribution of data between training and testing sets. The dataset consists of both normal traffic and various types of network-based attacks, with each instance represented by 41 network connection features and a corresponding class label indicating whether the connection is normal or an attack. In this study, for clarity of analysis, the class labels were simplified into a binary form: normal and attack. The features of the NSL-KDD dataset can be categorized into four major groups. First, the Basic Features-such as $protocol\_type$, $service$, $flag$, $src\_bytes$, and $dst\_bytes$-represent fundamental communication characteristics, including the protocol used and the number of bytes transmitted in each direction. Second, the Content Features, including $num\_failed\_logins$, $logged\_in$, reflect packet content and authentication-related activities. Third, the Traffic Statistical Feature-such as $count$, $srv\_count$, $dst\_host\_count$, and $dst\_host\_srv\_count$-capture traffic patterns by measuring the frequency of connections to the same host or service within a certain time window. Finally, the Error/Reset Rate Features, including $serror\_rate$, $srv\_serror\_rate$, and $dst\_host\_serror\_rate$, indicate abnormal handshake failures or connection resets in TCP sessions, which are highly indicative of potential attacks. Pearson correlation analysis between the class label and each feature revealed that attack traffic is characterized by abnormal connection terminations and frequent session initiation attempts. Specifically, $serror\_rate$, $srv\_serror\_rate$, and $dst\_host\_serror\_rate$ showed a strong positive correlation with attacks, as these features typically increase in attack types such as DoS and port scans, where the TCP three-way handshake fails to complete normally and the proportion of abnormal SYN packets rises sharply. Similarly, $count$ and $srv\_count$ exhibited a strong positive correlation with attack traffic, consistent with the tendency of attacks to generate numerous connection attempts within a short time period. In contrast, $logged\_in$, $srv\_diff\_host\_rate$, and $dst\_host\_same\_srv\_rate$ displayed negative correlations with attack traffic, as normal users tend to perform successful authentications and maintain persistent connections to specific services, whereas attack traffic often demonstrates random host and service probing behavior.to propose future research directions for network anomaly detection ^[23].

4.3. CICIDS2017

The CICIDS2017 dataset is a dataset for intrusion detection research built by the Canadian Institute for Cybersecurity (CIC) in 2017 by simulating the actual internal network environment. This dataset includes not only normal user traffic, but also various latest attack scenarios such as Brute Force, DoS, DDoS, Web Attack, Botnet, and Infiltration, and the entire network flow is expressed with more than 80 feature values. As a result of the correlation analysis, features such as $Bwd Packet Length Std$, $PSH Flag Count$, $Packet Length Variance$, $Bwd Packet Length Max$, and $Avg Bwd Segment Size$ exhibited strong positive correlations with attack traffic. This is attributable to the abnormal transmission behavior commonly observed during attacks, where packet sizes and segment lengths vary significantly rather than remaining stable. For example, in DDoS or port scanning attacks, large volumes of irregular packets are repeatedly transmitted within short time intervals, leading to fluctuating and unstable response traffic on the server side. Such instability is reflected in packet length-based metrics through increased variance and elevated mean values. Conversely, features such as $Min Packet Length$, $Bwd Packet Length Min$, $URG Flag Count$, and $Fwd Packet Length Std$ showed negative correlations with normal traffic. In typical network communication, packet lengths tend to remain within stable and predictable ranges, and session flows are maintained consistently. As a result, minimum packet size and packet length variance do not fluctuate substantially under normal conditions. Additionally, control flags such as URG rarely appear in regular traffic, and TCP session flows usually exhibit orderly progression. Therefore, these features capture the inherent stability and consistency characteristic of benign network traffic and function as key indicators for distinguishing it from attack behavior. ^[24].

Fig. 3. Correlation matrix between features and the binary class label in the CICIDS2017 dataset.

5. Model

According to recent surveys, CNNs and LSTMs are the most widely used models for anomaly detection on benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017. These datasets consist of large-scale network traffic, where each flow contains dozens of continuous and time-series attributes, including packet length, session duration, transmission speed, and flag state. These data reflect normal traffic patterns and repetitive behaviors, whereas rapid changes or abnormal distributions may occur at specific times or intervals. In addition, the datasets are imbalanced, with normal traffic accounting for the majority of records while attack data exist in various forms, such as DoS, brute force, and web attacks. Therefore, network traffic exhibit multidimensional, time-series, and imbalanced characteristics simultaneously, which are key factors that must be considered when designing models for anomaly detection.

Table 3 categorizes various NIDS models by approach, architecture, key techniques, dataset, learning type, preprocessing or feature selection, and reported metrics. The symbol “–” indicates that the corresponding information was not specified in the original paper.

Table 3. Categorization of NIDS models by learning approach and architecture.

Approach	Main architecture	Model (key techniques)	Dataset	Learning	Preprocessing / feature selection	Metrics (reported)
Classical ML	DT	DT(J48) + BestFirst FS + ANN/KNN/SVM/RF/NB ^[30]	UNSW-NB15	Supervised	One-hot encoding, Min-Max normalization	ACC 86.41%, ADR 97.95%, FAR 27.73%
	RF	GA-based feature selection (C4.5, RF, NBTree) ^[31]	UNSW-NB15	Supervised	Feature selection with GA, transformation	ACC 81.42%, FAR 6.39%
	SVM	XGBoost, SVM ^[32]	CICIDS2017	Supervised	Dataset aggregation, configuration	ACC 99.11%, F1-score 99.19%
Deep learning	CNN	Residual CNN + RNN ^[33]	UNSW-NB15, NSL-KDD	Supervised	One-hot encoding, standardization	DR 97.75%, ACC 86.64%, FAR 1.30%
	CNN	CNN ^[34]	UNSW-NB15	Supervised	Value cleaning, MinMaxScaler, RF-based FS	ACC 99%, Precision 89%, Recall 99%, F1-score 94%
	CNN+LSTM	CNN + LSTM + Attention ^[35]	UNSW-NB15	Supervised	ANOVA F-test, Min-Max, one-hot encoding	F1-score 83%, AUC 88%, Precision 86%, Recall 82%
	LSTM/FNN	LSTM, FNN ^[36]	CICIDS2017, CTU-13	Supervised	NetFlow feature extraction, missing value removal	F1-score 99.703%
	MLP/1D-CNN	MLP, 1D CNN, LOF, OCSVM ^[37]	CICIDS2017	Supervised	Data cleaning, standardization	ACC 97.75%, Precision 98.94%, Recall 90.36%, F1-score 94.46%
	BiLSTM	Lightweight CNN + BiLSTM + $\chi^2$ FS ^[38]	UNSW-NB15	Supervised	Categorical encoding, normalization	ACC 97.90%, Precision 97.91%, Recall 97.90%, F1-score 97.90%
	ARN (RNN)	ARN-based IDS ^[39]	SWaT, UNSW-NB15	Supervised	Word embedding, normalization	ACC 95.48%, P recision 94.96%, Recall 95.45%, F1-score 95.2%
	DBN	Deep Belief Network (DBN) ^[40]	CICIDS2017	Supervised	Class balancing, normalization, RBM pretraining	F1-score 87.3% $\rightarrow$ 94%
	GRU	Gated Recurrent Unit (GRU) ^[41]	CICIDS2017	Supervised	Categorical encoding, standardization	ACC 99.69%, Precision 99.65%, Recall 99.69%, F1-score 99.70%
	CapsNet	Capsule Network (CapsNet) ^[42]	UNSW-NB15	Supervised	One-hot encoding, normalization	ACC 99%, Precision 98%, Recall 99%, F1-score 98%
	DGM	Data Generative Model ^[43]	CICIDS2017	Semi-supervised	Oversampling rare attack types	F1-score 99.92%
	GAN	Generative Adversarial Network (GAN) for IDS ^[44]	CICIDS2017	Semi-supervised	–	–
	LSTM	LSTM with categorical embedding ^[45]	UNSW-NB15	Supervised	Embedding encoding, normalization	ACC 99.7%
	CNN+ BiLSTM	Lightweight CNN + BiLSTM ^[46]	UNSW-NB15	Supervised	Categorical encoding, normalization	ACC 97%
	DNN	Self-supervised contrastive DNN ^[47]	UNSW-NB15	Self-supervised	Contrastive pretraining, normalization	ACC 94%
Hybrid / ensemble	Multi-stage	Multi-stage ML pipeline (Oversampling + IG/Correlation FS + HPO) ^[48]	UNSW-NB15	Supervised	Oversampling, IG, correlation-based FS, HPO	–
	MLP+FS	MLP + IG + RF importance $\rightarrow$ RFE Hybrid FS ^[49]	UNSW-NB15	Supervised	Duplicate removal, minority resampling	ACC 82.25%$\rightarrow$84.24%
	Ensemble	Logistic Regression + RF + LSTM + MLP ^[50]	NSL-KDD, UNSW-NB15	Supervised	Scaling, SMOTE, Feature engineering	ACC 97.7%, Recall 96.9%, Precision 99.3%
	LSTM+Opt	LSTM + SGDM + Pruning ^[51]	UNSW-NB15	Supervised	Label encoding, Min-Max normalization	ACC 99.0630%, FAR 0.3913%, DR 88.3317%, F1-score 90.1209%
	RF+XGB	Random Forest + XGBoost Ensemble ^[52]	CICIDS2017	Supervised	PCA-based feature selection	ACC 98.05%
	SAE+SVM	Stacked Autoencoder + SVM ^[53]	UNSW-NB15	Supervised	Feature normalization	ACC 87%, Precision 82%, Recall 79%, F1- score 81%
	CART+RF	CART + Random Forest Hybrid IDS ^[54]	UNSW-NB15, CICIDS2017	Supervised	Feature importance, normalization	ACC 96%
	GAN+CNN+ BiLSTM	GAN-based synthetic sampling + CNN+BiLSTM ^[55]	CICIDS2017	Supervised	Oversampling with GAN	Precision 100%, Recall 77%, F1-Score 87%
	Transformer+ CART	Transformer + wrapper-based FS ^[56]	UNSW-NB15	Supervised	CART wrapper	Acc 93%, Precision 91%, Recall 92%, F1-score 92%
	XGB	XGBoost + Optimized Sequential Neural Network ^[57]	NSL-KDD, UNSW-NB15, CICIDS2017	Supervised	Grid Search HPO, filtering, normalization	ACC 99.93%, F1 99.84%, MCC 99.86%, FPR 0.0004%
Others	Autoencoder	Autoencoder (AE), baseline: One-Class SVM ^[58]	CICIDS2017	Semi / Unsupervised	Train on normal data only	Zero-day ACC 75%–98%
	KNN-based	IPCA + Self-Adjusting Memory KNN (SAM-KNN) ^[59]	UNSW-NB15	Supervised	SMOTE, normalization, oversampling	ACC 98.91%, Precision 98.97%, Recall 99.50%, F1-score 99.23%
	DNN+Tree	ReLU-based DNN + Extra Tree ^[60]	UNSW-NB15	Supervised	Dimensionality reduction	ACC 97.93%, Recall 97%, Precision 97%, F1-score 97%
	CE-GAN	CE-GAN based data augmentation for IDS ^[61]	NSL-KDD, UNSW-NB15	Supervised	Conditional GAN augmentation	PRD 66.1375%, RMSE 0.2243%, MAE 0.1361%
	RNN Comparative	Comparative study of RNN models ^[62]	CICIDS2017, NSL-KDD, UNSW-NB15	Supervised	Standardization	Accuracy across variants
	DNN	Class-wise focal-loss VAE with DNN for IoT IDS ^[63]	NSL-KDD	Supervised	Class-wise Focal Loss, data augmentation	ACC 88.08 %, FPR 3.77 %, U2R 79.25 %, R2L 67.5 %
	DNN	Improved Conditional VAE + DNN for DDoS detection ^[64]	NSL-KDD, UNSW-NB15	Supervised	Conditional VAE, DNN classifier	ACC 99.47%, Recall 0.994%, Precision 0.995%
	Fed-Unsupervised-KMeans	Federated unsupervised clustering IDS using k-means ^[65]	CICIDS2017, UNSW-NB15	Unsupervised / Federated	Data normalization, unsupervised variable selection, silhouette-based clustering	Clustering silhouette analysis across datasets

5.1. CNN-based Models

The main reason for using CNN in this study is the multivariate structure and temporal continuity of network traffic, as well as the severe class imbalance commonly found in intrusion detection datasets. In datasets such as UNSW-NB15 and NF-UQ-NIDS-V2, normal traffic accounts for the vast majority of samples, while attack instances are relatively scarce. Moreover, each flow contains many interdependent continuous and categorical features, including packet length, transmission rate, session duration, and TCP flag combinations. Because these features interact with each other, it is difficult for traditional statistical analysis or single-feature-based detection methods to accurately identify attack behavior. CNN provides two main advantages in addressing these challenges. First, 1D convolution allows CNN to capture local temporal patterns that appear within short time intervals. Many attacks do not change the overall traffic distribution but instead occur as short bursts, repeated attempts, or sudden changes in specific feature combinations. For example, port-scanning produces repeated connection attempts within very short intervals; DDoS attacks generate large bursts of packets with specific flag patterns; and web attacks may cause brief changes in payload composition. CNN filters can learn these localized shape patterns and detect subtle anomalies that are difficult to identify with global statistical metrics. Second, CNN can learn the relationships between multiple features as a combined spatial pattern. In many cases, attack traffic is characterized not only by the value of individual attributes, but by how several attributes shift together. For instance, the simultaneous occurrence of a particular TCP flag pattern and a sudden collapse in the packet-to-byte ratio is difficult to detect using a single-feature threshold. However, CNN can model this as one integrated pattern. Through this process, irrelevant features are suppressed and important discriminative features are emphasized, allowing the model to maintain high generalization performance even when the dataset is highly imbalanced ^[66]. However, CNN has limitations. Since convolution mainly focuses on local patterns, it is not suitable for modeling long-term dependencies or global changes in traffic behavior. Therefore, in attack scenarios where abnormal signals accumulate slowly over a long period, such as low-rate or stealthy scanning, CNN alone may not be sufficient ^[25, ^33, ^34].

5.2. LSTM-based Models

Long short-term memory (LSTM) is a recurrent neural network structure proposed to solve the vanishing gradient problem of the existing Recurrent Neural Network (RNN), and it effectively preserves long-term dependence in a time series through a memory cell state composed of an input gate, a forget gate, and an output gate. Thanks to this structure, LSTM can maintain important temporal information for a long time while selectively removing unnecessary information, allowing pattern learning to be more stable than that of a general RNN.

Although signs of attacks in network traffic often appear rapidly at a single point in time, in actual environments they more commonly change gradually over time or appear in the form of specific pattern repetitions. For example, low-speed port scanning exhibits repeated connection attempts at regular intervals, and C2 (command and control) beacon signals show periodic communication patterns within a long session. In addition, data leakage attacks can be carried out in such a way that the amount of transmitted data gradually increases or the session becomes abnormally long. These anomalies may appear normal when only individual packets are examined, but abnormalities can be identified by considering temporal changes throughout the entire session ^[26, ^36]. However, LSTM also has its limitations. First, as the length of the time sequence increases, the amount of computation accumulates, resulting in scalability problems such as increased computational cost and processing delay when handling large real-time traffic streams. In addition, although long-term dependence can theoretically be learned, in real complex network environments long-term information is gradually diluted, and complete long-term context preservation is not guaranteed. For this reason, recent studies have actively explored approaches that expand LSTM into GC-LSTM structures combined with Graph Convolutional Networks, or into transformer-based models that can learn long-term dependence more efficiently ^[67, ^68].

5.3. GAN-based Models

There is a structural problem in the network intrusion detection dataset. Most of the traffic generated in the real network environment is normal communication, and attack behavior accounts for only an extremely low percentage. This attack sample scarcity limits the opportunity for the model to fully learn attack patterns, greatly degrading the detection performance of new or modified attacks. In addition, there is a class imbalance problem in which the ratio between normal traffic and attack traffic is extremely asymmetric. In particular, since there are very few rare attack classes such as Botnet and Infiltration in the dataset, supervised learning-based detection models tend to be trained with a bias toward normal traffic, which eventually leads to a decrease in the recall of the attack class ^[44]. This data structural problem is more evident in the CESNET-TimeSeries24 dataset. CESNET-TimeSeries24 is a purely normal-based time-series dataset collected over a long period of time, with few samples labeled as attacks. In this case, the model learns only the distribution of the normal pattern narrowly, so even if an actual attack occurs, it is highly likely that it will not be detected but instead treated as a temporary fluctuation of the normal pattern. In particular, the abnormal patterns appearing in this dataset are not numerical outliers at a single point in time but temporal and continuous pattern changes such as changes in average delay time, packet burst occurrences, collapsed traffic periodicity, and inter-arrival time distortion. These changes cannot be reproduced with traditional oversampling techniques such as simple feature replication or SMOTE ^[27].

In this study, a Generative Adversarial Network (GAN) was used to solve this problem. A GAN can probabilistically generate new attack samples while maintaining the distributed characteristics and intrinsic patterns of real data through competitive learning between the generator and discriminator. Unlike simple data replication, GAN-generated samples contain finely deformed forms while preserving existing attack patterns, helping the detection model learn more generalized representations of attack behavior. As a result, stable learning is possible even in rare attack classes, and the recall and F1-score of the detection model are significantly improved.

However, there are also limitations in applying GANs. First, there is a risk that the generator will repeatedly create only a few patterns due to the mode collapse phenomenon. In this case, the diversity of generated data is low, so it does not sufficiently reflect the various modified attacks that can occur in the real network environment. Second, advanced GAN models do not always guarantee better performance depending on the complexity of the data distribution. In this study, CTGAN was advantageous for learning complex distributions, but Vanilla GAN and WGAN worked more stably in segments with simple distributions. Third, it was observed that the increase in the amount of generated data was not proportional to the performance improvement. Initially, the performance improved significantly when the attack data was expanded to a level of four times, but the improvement gradually decreased even when increased to 49 times and 99 times ^[28, ^29].

Taken together, GAN is a valid approach to alleviating the problem of rare attack data scarcity and class imbalance and to improving the generalization ability of detection models. However, issues such as securing generative data diversity, selecting models based on data distribution characteristics, and optimizing the production volume remain important challenges to be solved. ^[55, ^69]

6. Challenges and Future Directions

Traditional deep learning-based intrusion detection studies have primarily employed CNN, LSTM, and GAN architectures. However, these models fundamentally struggle to capture the high-dimensional correlation structures, multi-protocol interactions, and dynamic temporal evolution inherent in network traffic data. CNN-based models demonstrate strong performance in extracting local features through convolutional filters, yet they are limited in modeling long-range behavioral patterns that accumulate progressively across a session. LSTM models can preserve sequential information through recurrent structures; however, as sequence length increases, they suffer from gradient vanishing and significant computational overhead, which makes them impractical for high-speed and large-scale real-time network environments. GAN-based approaches can help mitigate data imbalance for rare and zero-day attacks, yet they often exhibit unstable training behavior and frequent mode collapse, hindering their ability to capture diverse attack patterns accurately.

In contrast, Transformer architectures leverage self-attention mechanisms to directly model global dependencies within input sequences, effectively mitigating the local-pattern bias of CNNs and the long-range dependency issues of LSTMs. This structural advantage enables richer representation of complex feature relationships in network traffic, such as protocol-field interactions, payload-level flow behaviors, and cross-session correlations. However, Transformer-only models still experience performance degradation under extreme class-imbalance conditions, particularly when detecting rare attack events ^[73].

To address this limitation, contrastive learning-based representation methods have gained increasing attention. Contrastive learning enables clear separation between normal and abnormal traffic instances in the representation space, even with limited labeled data. By first learning the intrinsic clustering structure of normal flows, deviations from this structure can be effectively identified as anomalies. As a result, these methods significantly improve generalization to rare attacks, mutated threats, and zero-day intrusions ^[75].

Furthermore, recent Large Language Model (LLM)-based approaches provide strong semantic understanding and behavioral reasoning capabilities derived from large-scale pretraining. LLMs are highly effective in interpreting unstructured data such as logs, packet text, and system events. Their zero-shot and few-shot learning capabilities offer robust adaptability to new or previously unseen attack types, even under limited labeling conditions. Additionally, LLMs support multimodal integration of packet payloads, traffic metadata, and log information, providing superior versatility, scalability, and explainability compared to conventional CNN, LSTM, and GAN-based models ^[76].

However, advancing these methodological directions requires modern datasets that accurately reflect today’s heterogeneous, encrypted, distributed, and high-speed network environments. Existing benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017 are widely used but remain limited due to synthetic traffic patterns and outdated attack types. To overcome these limitations, future work will incorporate more recent real-world datasets:

CESNET-TimeSeries24 (2024): A large-scale ISP backbone traffic dataset collected over 40 weeks, containing traffic from more than 275,000 active IP addresses, 6.6 billion flows, and 4 trillion packets. Each sample is aggregated into multiresolution time-series with twelve key statistical behavioral features. The dataset supports anomaly detection at IP, subnet, and institutional network scales and includes point, contextual, collective, and trend anomalies, offering a realistic benchmark for evaluating temporal anomaly detection models ^[71].
NF-UQ-NIDS-v2 (2023–2024): A unified intrusion detection dataset that integrates diverse traffic environments into a single large-scale benchmark. It contains 11,994,893 flow records described by 43 statistical features, spanning 10 attack categories including DDoS, infiltration, botnet, brute-force, and web-based exploits. The dataset consists of 9,208,048 normal flows and 2,786,845 attack flows, providing a realistic distribution of benign and malicious traffic. Its diversity and volume enable robust evaluation of models under realistic multi-attack, multi-protocol conditions ^[72].

7. Conclusion

This study comprehensively considered network anomaly detection research trends, focusing on major benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017. As a result of the analysis, CNN-based models showed relatively stable performance even in high-dimensional and imbalanced data environments due to their strength in learning regional features and hierarchical representations, but there was a limit to sufficiently reflecting the long-term attack behavior patterns that occur throughout the session. LSTM-based models can effectively model time series patterns and long-term dependence, but as the sequence length increased, the computational cost increased rapidly, which limited the application of large-scale real-time network environments. In addition, GAN-based approaches can alleviate the data imbalance problem for rare and zero-day attacks, but due to training instability and mode collapse problems, it was difficult to reliably reflect the actual attack distribution. Furthermore, network traffic itself exhibits severe class imbalance, multivariate and high-dimensional characteristics, and real-time detection requirements, which limit the generalization performance of existing models and their practical applicability. To compensate for these limitations, recent studies have focused on Transformer-based time-series learning, which enables direct modeling of global correlations and efficient parallel processing through self-attention. In addition, by clearly separating the representation space between normal and abnormal traffic, contrastive learning improves generalization to rare and zero-day attacks and offers higher training stability than GAN-based augmentation methods. Moreover, LLM-based multimodal approaches provide integrated understanding of unstructured information such as logs, packet text, and metadata, demonstrate strong adaptability even in label-scarce environments through zero-shot and few-shot inference, and offer superior model interpretability compared to conventional deep learning models. Therefore, future network anomaly detection research is expected to advance in the following directions:

Learning structural representations and enhancing data augmentation strategies to address data imbalances
Design lightweight, real-time detection models based on Transformer and Contrastive Learning
Establishment of an evaluation system considering model interpretability and real-world deployability

Acknowledgement

This paper was supported by the Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE) (No.RS-2021-KI002499, HRD Program for Industrial Innovation).

References

S. M. Kasongo , Y. Sun , Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset, Journal of Big Data, Vol. 7, pp. 105, 2020

S. García , M. Grill , J. Stiborek , A. Zunino , An empirical comparison of botnet detection methods, Computers & Security, Vol. 45, pp. 100-123, 2014

J. Yu , X. Gao , B. Li , F. Zhai , J. Lu , B. Xue , S. Fu , C. Xiao , A filter-augmented auto-encoder with learnable normalization for robust multivariate time series anomaly detection, Neural Networks, Vol. 170, pp. 478-493, 2024

L. Yu , Q. Lu , Y. Xue , DTAAD: dual TCN-attention networks for anomaly detection in multivariate time series data, Knowledge-Based Systems, Vol. 295, No. 111849, 2024

S. Liu , B. Zhou , Q. Ding , B. Hooi , Z. Zhang , H. Shen , X. Cheng , Time series anomaly detection with adversarial reconstruction networks, IEEE Transactions on Knowledge and Data Engineering, Vol. 35, No. 4, pp. 4293-4306, 2022

M. Munir , S. A. Siddiqui , A. Dengel , S. Ahmed , DeepAnT: A deep learning approach for unsupervised anomaly detection in time series, IEEE Access, Vol. 7, pp. 1991-2005, 2019

D. L. Marino , C. S. Wickramasinghe , C. Rieger , M. Manic , Self-supervised and interpretable anomaly detection using network transformers, IEEE Transactions on Industrial Informatics, Vol. 21, No. 5, pp. 4252-4261, 2025

L. Xu , K. Xu , Y. Qin , Y. Li , X. Huang , Z. Lin , X. Ji , TGAN-AD: transformer-based GAN for anomaly detection of time series data, Applied Sciences, Vol. 12, No. 16, 2022

G. G. González , P. Casas , E. Martínez , A. Fernández , Towards foundation auto-encoders for time-series anomaly detection, arXiv preprint, 2025

B. Golchin , B. Rekabdar , Anomaly detection in time series data using reinforcement learning, variational autoencoder, and active learning, Proc. of 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), 2025

S. D. D. Anton , S. Sinha , H. D. Schotten , Anomaly-based intrusion detection in industrial data with SVM and random forests, Proc. of the International Conference on Software, Telecommunications and Computer Networks, 2019

S. D. Anton , L. Ahrens , D. Fraunholz , H. D. Schotten , Time is of the essence: machine learning-based intrusion detection in industrial time series data, Extended version of a publication in the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1-6, 2018

K. Tscharke , M. Wendlinger , A. Ahouzi , P. Bhardwaj , K. Amoi-Taleghani , M. Schrödl-Baumann , P. Debus , Quantum autoencoder for multivariate time series anomaly detection, Proc. of 2025 IEEE International Conference on Quantum Computing and Engineering (QCE), 2025

Z. Z. Darban , G. I. Webb , S. Pan , C. Aggarwal , M. Salehi , Deep learning for time series anomaly detection: A survey, ACM Computing Surveys, Vol. 57, No. 1, pp. 1-42, 2025

Y. Qin , D. Song , H. Chen , W. Cheng , G. Jiang , G. Cottrell , A dual-stage attention-based recurrent neural network for time series prediction, arXiv preprint arXiv:1704.02971, 2017

S. Hochreiter , J. Schmidhuber , Long short-term memory, Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997

X. Xu , H. Wang , Y. Liang , P. S. Yu , Y. Zhao , K. Shu , Can multimodal LLMs perform time series anomaly detection?, Proc. of the ACM Web Conference, pp. 5392-5403, 2026

V. Chandola , A. Banerjee , V. Kumar , Anomaly detection: A survey, ACM Computing Surveys, Vol. 41, No. 3, pp. 1-58, 2009

K. Haukat , T. M. Alam , S. Luo , S. Shabbir , I. Hameed , J. Li , S. Abbas , U. Javed , , Advances in Information and Communication, Vol. 1363, 2021

G. Ciaburro , G. Iannace , Machine learning-based algorithms to knowledge extraction from time series data: A review, Data, Vol. 6, No. 55, 2021

F. Wang , Y. Jiang , R. Zhang , A. Wei , J. Xie , X. Pang , A survey of deep anomaly detection in multivariate time series: taxonomy, applications, and directions, Sensors, Vol. 25, No. 190, 2025

N. Moustafa , J. Slay , UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1-6, 2015

M. Tavallaee , E. Bagheri , W. Lu , A. A. Ghorbani , A detailed analysis of the KDD CUP 99 data set, Proc. of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1-6, 2009

I. Sharafaldin , A. H. Lashkari , A. A. Ghorbani , Toward generating a new intrusion detection dataset and intrusion traffic characterization, Proc. of the International Conference on Information Systems Security and Privacy (ICISSP), Vol. 1, pp. 108-116, 2018

S. Bai , J. Z. Kolter , V. Koltun , An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv preprint arXiv:1803.01271, 2018

B. Radford , L. Apolonio , A. Trias , J. Simpson , Network traffic anomaly detection using recurrent neural networks, arXiv preprint arXiv:1803.10769, 2018

B. Zhou , S. Liu , B. Hooi , X. Cheng , J. Ye , BeatGAN: anomalous rhythm detection using adversarially generated time series, Proc. of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp. 4433-4439, 2019

I. Sharafaldin , A. Gharib , A. H. Lashkari , A. A. Ghorbani , Towards a reliable intrusion detection benchmark dataset, Software Networking, Vol. 2018, No. 1, pp. 177-200, 2018

A. Thakkar , R. Lohiya , A review on machine learning and deep learning perspectives of IDS for IoT: recent updates, security issues, and challenges, Archives of Computational Methods in Engineering, Vol. 28, No. 4, pp. 3211-243, 2021

M. A. Umar , Z. Chen , Y. Liu , Network intrusion detection using wrapper-based decision tree for feature selection, Proc. of the 2020 International Conference on Internet Computing for Science and Engineering, pp. 5-13, 2020

C. Khammassi , S. Krichen , A GA-LR wrapper approach for feature selection in network intrusion detection, Computers Security, Vol. 70, pp. 255-277, 2017

S. Farhat , M. Abdelkader , A. Meddeb-Makhlouf , F. Zarai , Evaluation of DoS/DDoS attack detection with ML techniques on CIC-IDS2017 dataset, Proc. of the International Conference on Information Systems Security and Privacy (ICISSP), pp. 287-295, 2023

P. Wu , H. Guo , N. Moustafa , Pelican: A deep residual network for network intrusion detection, Proc. of the 2020 IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 55-62, 2020

A. D. Vibhute , M. Khan , C. H. Patil , S. V. Gaikwad , A. V. Mane , K. K. Patel , Network anomaly detection and performance evaluation of convolutional neural networks on UNSW-NB15 dataset, Procedia Computer Science, Vol. 235, pp. 2227-2236, 2024

K. Psychogyios , A. Papadakis , S. Bourou , N. Nikolaou , A. Maniatis , T. Zahariadis , Deep learning for intrusion detection systems (IDSs) in time series data, Future Internet, Vol. 16, No. 3, 2024

A. Corsini , S. J. Yang , G. Apruzzese , On the evaluation of sequential machine learning for network intrusion detection, Proc. of the 16th International Conference on Availability, Reliability and Security (ARES), pp. 1-10, 2021

Z. Xu , Y. Liu , Robust anomaly detection in network traffic: evaluating machine learning models on CICIDS2017, Proc. of 2025 10th International Conference on Electronic Technology and Information Science (ICETIS), 2025

M. Jouhari , H. Benaddi , K. Ibrahimi , Efficient intrusion detection: combining X2 feature selection with CNN-BiLSTM on the UNSW-NB15 dataset, Proc. of the 2024 11th International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 1-6, 2024

Z. Liu , D. Ye , C. Yang , Y. Ding , Y. Liu , L. Tang , C. Chen , Simplicity over complexity: an ARN-based intrusion detection method for industrial control network, arXiv preprint arXiv:2412.14669, 2024

O. Belarbi , A. Khan , P. Carnelli , T. Spyridopoulos , An intrusion detection system based on deep belief networks, Proc. of 4th International Conference on Science of Cyber Security, pp. 377-392, 2022

B. Cao , C. Li , Y. Song , Y. Qin , C. Chen , Network intrusion detection model based on CNN and GRU, Applied Sciences, Vol. 12, pp. 4184, 2022

M. Khan , A. Rahman , S. Lee , Improving intrusion detection with hybrid deep learning models: A study on CIC-IDS2017, UNSW-NB15, and KDD CUP 99, Journal of Information Systems Engineering and Management, Vol. 10, No. 11s, pp. 1-12, 2025

A. S. BBarkah , S. R. Selamat , Z. Z. Abidin , R. Wahyudi , Data generative model to detect the anomalies for IDS imbalance CICIDS2017 dataset, TEM Journal, Vol. 12, No. 1, pp. 1-7, 2023

M. Al-Ajlan , M. Ykhlef , A review of generative adversarial networks for intrusion detection systems: advances, challenges, and future directions, Computers, 2024

H. Gwon , C. Lee , R. Keum , H. Choi , Network intrusion detection based on LSTM and feature embedding, arXiv preprint, 2019

M. Jouhari , M. Guizani , Lightweight CNN-BiLSTM based intrusion detection systems for resource-constrained IoT devices, Proc. of the 2024 International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1558-1563, 2024

S. Lotfi , M. Modirrousta , S. Shashaani , M. A. Shoorehdeli , Network intrusion detection with limited labeled data using self-supervision, arXiv preprint, 2022

M. Injadat , A. Moubayed , A. B. Nassif , A. Shami , Multi-stage optimized machine learning framework for network intrusion detection, IEEE Transactions on Network and Service Management, Vol. 18, No. 2, pp. 1803-1816, 2020

Y. Yin , J. Jang-Jaccard , W. Xu , A. Singh , J. Zhu , F. Sabrina , J. Kwak , IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset, Journal of Big Data, Vol. 10, No. 1, 2023

B. Tafreshian , S. Zhang , A defensive framework against adversarial attacks on machine learning-based network intrusion detection systems, Proc. of the 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 2436-2441, 2024

T. T. Huynh , T. Nguyen Hoang , Effective multi-stage training model for edge computing devices in intrusion detection, International Journal of Computer Networks Communications, Vol. 16, 2024

C. S. Sampath , P. Anuradha , Intrusion detection using machine learning: A random forest-based approach, International Journal for Multidisciplinary Research, Vol. 5, No. 3, pp. 1-6, 2023

N. Fathima , A. Pramod , Y. Srivastava , A. M. Thomas , Two-stage deep stacked autoencoder with shallow learning for network intrusion detection system, arXiv preprint, 2021

R. Mohammad , F. Saeed , A. A. Almazroi , F. S. Alsubaei , A. A. Almazroi , Enhancing intrusion detection systems using a deep learning and data augmentation approach, Systems, Vol. 12, No. 3, 2024

X. Zhao , K. W. Fok , V. L. Thing , Enhancing network intrusion detection performance using generative adversarial networks, Computers Security, Vol. 145, 2024

M. Umer , M. Tahir , M. Sardaraz , M. Sharif , H. Elmannai , A. D. Algarni , Network intrusion detection model using wrapper based feature selection and multi head attention transformers, Scientific Reports, Vol. 15, No. 1, 2025

F. S. Alsubaei , Smart deep learning model for enhanced IoT intrusion detection, Scientific Reports, Vol. 15, No. 1, 2025

H. Hindy , R. Atkinson , C. Tachtatzis , J. N. Colin , E. Bayne , X. Bellekens , Utilising deep learning techniques for effective zero-day attack detection, Electronics, Vol. 9, No. 10, 2020

P. R. Agbedanu , R. Musabe , J. Rwigema , I. Gatare , Y. Pavlidis , IPCA-SAMKNN: A novel network IDS for resource constrained devices, Proc. of the 2022 2nd International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), pp. 540-545, 2022

M. Farhan , H. Waheed Ud Din , S. Ullah , M. S. Hussain , M. A. Khan , T. Mazhar , Network-based intrusion detection using deep learning technique, Scientific Reports, Vol. 15, No. 1, 2025

Y. Yang , X. Liu , D. Wang , Q. Sui , C. Yang , H. Li , A CE-GAN-based approach to address data imbalance in network intrusion detection systems, Scientific Reports, Vol. 15, No. 1, 2025

M. Tayebi , S. El Kafhali , Performance analysis of recurrent neural networks for intrusion detection systems in Industrial Internet of Things, Franklin Open, Vol. 12, 2025

S. Khanam , I. Ahmedy , M. Y. I. Idris , M. H. Jaward , Towards an effective intrusion detection model using focal loss variational autoencoder for internet of things (IoT), Sensors, Vol. 22, No. 15, 2022

C. Haripriya , M. P. Jagadeesh , An efficient autoencoder-based deep learning technique to detect network intrusions, International Transaction Journal of Engineering, Management, Applied Sciences Technologies, Vol. 13, No. 7, pp. 1-10, 2022

M. Gourceyraud , R. B. Salem , C. Neal , F. Cuppens , N. B. Cuppens , Federated intrusion detection system based on unsupervised machine learning, arXiv preprint, 2025

H. Chen , G.-R. You , Y.-R. Shiue , Hybrid intrusion detection system based on data resampling and deep learning, International Journal of Advanced Computer Science and Applications, Vol. 15, No. 2, 2024

W. Choukri , H. Lamaazi , N. Benamar , Abnormal network traffic detection using deep learning models in IoT environment, Proc. of the 2021 3rd IEEE Middle East and North Africa Communications Conference (MENACOMM), pp. 98-103, 2021

T. Sharma , S. Gandage , Network traffic classification using long-short term memory algorithm on UNSW-NB15 and KDD-CUP99 dataset, Mathematical Statistician and Engineering Applications, Vol. 71, No. 4, pp. 10166-10181, 2022

L. Xu , M. Skoularidou , A. Cuesta-Infante , K. Veeramachaneni , Modeling tabular data using conditional GAN, Proc. of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 7335-7345, 2019

X. Zhao , K. W. Fok , V. L. L. Thing , Enhancing network intrusion detection performance using generative adversarial networks, Computers Security, Vol. 145, pp. 104005, 2024

J. Koumar , K. Hynek , T. Cejka , P. Šiška , CESNET-TimeSeries24: time series dataset for network traffic anomaly detection and forecasting, Scientific Data, Vol. 12, No. 1, pp. 338, 2025

J. Krupski , M. Iwanowski , W. Graniszewski , Extraction of minimal set of traffic features using ensemble of classifiers and rank aggregation for network intrusion detection systems, Applied Sciences, Vol. 14, No. 16, pp. 6995, 2024

S.-M. Tseng , Y.-Q. Wang , Y.-C. Wang , Multi-class intrusion detection based on transformer for IoT networks using CIC-IoT-2023 dataset, Future Internet, Vol. 16, No. 8, pp. 284, 2024

W.-S. Park , G.-N. Kim , S. Lee , Intrusion detection system based on packet payload analysis using transformer, Journal of The Korea Society of Computer and Information, Vol. 28, No. 11, pp. 81-87, 2023

X. Tan , J. Cheng , H. Li , Y. Yang , Contrastive learning for network intrusion detection: A comprehensive survey, Proceedings of the 2024 2nd International Conference on Computing, Internet of Things and Smart City (CIoTSC), pp. 160-166, 2025

M. A. Rahman , A survey on security and privacy of multimodal LLMs: connected healthcare perspective, Proc. of the 2023 IEEE Globecom Workshops (GC Wkshps), pp. 1807-1812, 2023

Seoyeon Choi

Seoyeon Choi is currently an undergraduate student at Kwangwoon University, majoring in computer information engineering. Her research interests include cryptography, network security, and cyber security.

Songhye Kim

Songhye Kim is currently pursuing a B.S. degree in computer information engineering at Kwangwoon University, Seoul, Republic of Korea. Her research interests include cryptography, cybersecurity, and vulnerability analysis.

Jihyeon Ryu

Jihyeon Ryu is an assistant professor with the School of Computer and Information Engineering, Kwangwoon University. She received her B.S. degree in mathematics and computer science from Sungkyunkwan University, and a Ph.D. degree in cyber security from Sungkyunkwan University, Korea. Her research interests include cyber security, machine learning, and user authentication.