SSL Encryption Traffic Attack Behavior Recognition Method Based on Traffic Behavior
Characteristics
Weijie Song1*
Zufeng Hou1
Sixiao Guo1
Zhige Liao1
Jiadong Yan1
-
(Guangdong Power Grid Co., Ltd. Zhuhai Power Supply Bureau, Zhuhai 519075, Guangdong,
China)
Copyright © 2026 The Institute of Electronics and Information Engineers(IEIE)
Keywords
Traffic behavior characteristics, SSL, Encrypted traffic, Attack behavior, Identification method
1. Introduction
As digital transformation accelerates, cyberspace has become an integral part of critical
national infrastructure, business operations and personal life. With the widespread
application of emerging technologies such as cloud computing, Internet of Things,
and big data, the scale and complexity of network traffic has increased exponentially,
and the network environment has become more open and interconnected. Although this
highly interconnected nature greatly promotes information sharing and business innovation,
it also opens the door for cyber attackers, making cybersecurity threats increasingly
serious [1].
Especially in the past few years, the popularity of SSL/TLS encryption protocol has
become a double-edged sword in the field of network security. On the one hand, the
widespread deployment of encryption technology effectively improves the privacy and
integrity of data transmission, protects user privacy, and combats security threats
such as man-in-the-middle attacks. Encryption, on the other hand, has also become
a safe haven for malicious actors who exploit the invisibility of encrypted traffic
to hide attacks such as distributed denial of service (DDoS) attacks, advanced persistent
threats (APT), ransomware propagation, data breaches, and more. These attacks are
more difficult to detect and block by traditional security devices under the cover
of encryption, because traditional security devices mostly rely on the analysis of
plaintext data and are unable to deal with encrypted traffic. With the continuous
evolution of encryption standards and technologies, such as TLS 1.3, although the
security of encryption has been further enhanced, it also brings new challenges. Higher
encryption efficiency and stronger privacy protection mean less observable metadata,
which undoubtedly increases the technical difficulty of effectively identifying malicious
behavior in encrypted traffic [2]. The specific SSL process is shown in Fig. 1.
As digital transformation accelerates, the widespread use of SSL/TLS encryption has
become a double-edged sword, enhancing data privacy but also providing cover for cyber
attackers. Traditional security methods struggle to detect malicious activities in
encrypted traffic, creating a significant research gap. Our objectives are to develop
advanced feature extraction techniques, optimize feature selection, enhance model
fusion using ensemble learning, and evaluate the performance of our proposed method.
This work aims to provide a robust solution for detecting and mitigating encrypted
traffic attacks, improving overall network security.
Faced with these challenges, academia and industry are actively exploring and studying
how to efficiently identify and defend against malicious behavior in SSL/TLS encrypted
traffic without compromising user privacy. This requires not only a deep understanding
of the underlying characteristics of encrypted traffic, but also a combination of
advanced data analytics and intelligent algorithms to catch undetectable attack patterns.
Therefore, the development of innovative identification methods, which can not only
protect user privacy, but also effectively resist attacks in encrypted traffic, has
become a leading topic in current network security research, and has great theoretical
significance and practical value [3].
Due to the complexity and importance of malicious activity identification in encrypted
traffic, this study aims to propose a novel identification method based on traffic
behavior characteristics. Through in-depth analysis of the behavior patterns of encrypted
traffic and extraction of features with identification degree, the detection ability
of hidden attack behavior can be effectively improved, thus providing strong support
for network security protection system. This research can not only enhance the response
speed and accuracy of the network defense system, reduce the false positive and false
negative rate, but also help to detect and block potential security threats in time,
thus protecting the security of user data and maintaining the stability and health
of the network environment.
This research is devoted to designing and implementing an innovative SSL encryption
traffic attack behavior recognition system. The core content covers four key aspects:
First, by deeply mining the temporal behavior, statistical attributes and connection
patterns of SSL/TLS encryption traffic, a set of highly distinctive feature sets are
carefully selected and optimized to ensure that the recognition accuracy is improved
without sacrificing computational efficiency. Secondly, combining cutting-edge technologies
of machine learning and deep learning, such as random forest, gradient lifting tree,
convolutional neural network, recurrent neural network, etc., build and optimize models
to accurately learn and predict complex attack behavior patterns in encrypted traffic.
In addition, the research also involves building real-time monitoring systems to ensure
immediate analysis and response to network traffic, as well as establishing a comprehensive
performance evaluation framework to verify the effectiveness and advanced level of
the proposed methods in practical application scenarios [4].
The innovation of this research lies in: firstly, it innovatively integrates traditional
statistical features and novel temporal behavior features to create a multi-dimensional
feature space, especially strengthening the recognition efficiency of covert attack
behavior in encrypted traffic; secondly, it introduces model fusion strategy and adaptive
learning mechanism, which not only improves the robustness of the model by integrating
the advantages of various algorithms, but also realizes the self-optimization of the
model according to network dynamics and new threats.
2. Related Work
2.1. SSL/TLS Protocol and Its Security Features)
SSL/TLS protocol as the cornerstone of modern Internet communication security, its
importance is obvious. The core operation of the protocol begins with an exhaustive
handshake process that ensures secure key exchange, cipher suite selection (key to
determining encryption strength), and server authentication to prevent man-in-the-middle
attacks. After the handshake, the negotiated key and algorithm are used to encrypt
the transmission to ensure the confidentiality and integrity of the information. Security
analysis is key to evaluating the robustness of SSL/TLS protocols. The release of
the new version of the protocol marks a significant advance in security, not only
eliminating known weaknesses such as outdated encryption algorithms, but also simplifying
the handshake process [5], ensuring that even if the key is cracked in the future, the historical communication
content remains secure, greatly enhancing privacy protection. The efficiency and security
advantages of the new handshake protocol are discussed in depth in an analytical paper
detailing the improvements of the new mechanism [6]. Another study analyzed the security of the protocol through formal methods, and
proved the security of the protocol under various attack models, including man-in-the-middle
attack, replay attack and key leakage, by using security protocol verification tools,
which provided mathematical rigorous verification of the security of the protocol.
Overall, the SSL/TLS protocol has evolved to significantly enhance its ability to
withstand security threats. From handshaking mechanisms to forward security features,
each improvement aims to ensure secure delivery while improving the user experience
and reducing risk. Security is an ongoing process that requires constant attention
and updates to protocol security features to adapt to changes in the threat environment.
To address these challenges, we draw on recent advancements in intrusion detection
and prevention systems. Bhati and Rai [31] conducted an analysis of support vector machine (SVM)-based intrusion detection techniques,
highlighting their effectiveness in identifying various types of cyber threats. Their
study provides valuable insights into the strengths and limitations of SVMs in detecting
intrusions. Additionally, Bhati et al. [32] presented a comprehensive study of intrusion detection and prevention systems, covering
a wide range of methodologies and their practical applications. These studies underscore
the importance of integrating advanced machine learning techniques and robust feature
extraction methods to enhance the detection and prevention of cyberattacks. By leveraging
these insights, our proposed approach aims to bridge the gap in detecting malicious
activities in SSL/TLS encrypted traffic, ensuring a more secure and resilient network
environment.
2.2. Encrypted Traffic Analysis Technology
In recent years, encrypted traffic analysis technology has made significant progress
in dealing with increasingly complex network attacks, especially in feature engineering,
application of deep learning models, and development of automated analysis tools.
Recent studies have shown that, by combining deep packet detection techniques with
behavioral analysis, researchers can extract features in more dimensions, such as
the distribution of time intervals between packets within a stream, statistical features
of specific protocol fields, etc. [7]. A new method combining deep packet detection and flow level features is proposed,
which improves the classification accuracy of encrypted traffic effectively and shows
the potential of feature engineering in encrypted traffic analysis. Deep learning
models, especially attention mechanisms and the application of graph neural networks
(GNNs) to encrypted traffic analysis, bring new perspectives to understanding complex
traffic behavior patterns [8]. A graph neural network model based on attention mechanism is introduced to improve
the accuracy and efficiency of anomaly detection by learning the complex relationship
between traffic data. This approach exploits structural characteristics of traffic
data to reveal hidden patterns in traffic behavior [9]. In order to meet the demand of real-time analysis of massive encrypted traffic,
automated analysis tools and platforms have become research hotspots. For example,
Sarfaraz [10] proposes an automated analysis framework that integrates feature extraction, model
training and real-time monitoring, realizing end-to-end automation from data preprocessing
to attack detection, greatly improving analysis efficiency and response speed, and
reducing labor costs. With the implementation of data protection regulations such
as GDPRIV and CCPA, how to analyze encrypted traffic without violating user privacy
has become a new direction of research. Liu et al. [11] summarizes the strategies and techniques for security analysis while ensuring user
privacy, including differential privacy, homomorphic encryption and other methods,
providing a compliance path for encrypted traffic analysis.
2.3. Machine Learning and Deep Learning Models
In the broad field of encrypted traffic analysis and network security, the application
of machine learning and deep learning techniques is increasingly becoming the backbone
of identifying complex attack behaviors. Recent research results not only expand our
understanding of these technologies, but also facilitate their effective deployment
in practical defense systems. In this paper, researchers extensively reviewed the
application of various machine learning algorithms in the field of network security,
among which, Support Vector Machine (SVM) has become a powerful tool to distinguish
normal and abnormal traffic with its excellent classification ability in high-dimensional
space; Random Forest and Gradient Lifting Tree have improved the accuracy and robustness
of recognizing complex patterns with their excellent model integration ability. These
algorithms demonstrate excellent performance in classification tasks and anomaly detection,
providing a solid foundation for network security analysis. Deep learning, especially
convolutional neural networks (CNN) and recurrent neural networks (RNN), has shown
unprecedented potential for processing complex pattern recognition in encrypted traffic.
Yang et al. [12] reveals how these deep learning models skillfully capture time-series information
while learning deep into hidden features inside encrypted traffic. CNN can automatically
extract features from high-dimensional data through multi-layer convolution operation
and efficiently analyze the structural patterns of traffic; while RNN and LSTM (Long
Short Term Memory Network) and other variants use their unique memory units to not
only retain the continuity of time series, but also capture the long-term dependence
of traffic behavior, significantly improving the accuracy and fine granularity of
attack identification [13].
Recent research has also focused on improving the generalization and computational
efficiency of models, for example, using lightweight techniques to reduce model size
and improve deployment speed; reusing pre-trained models on limited network security
data through transfer learning; and integrating learning strategies to combine the
strengths of multiple models to improve overall recognition robustness. At the same
time, in the face of the continuous evolution of encryption protocols and the complexity
of attack methods, how to efficiently extract features under the premise of protecting
privacy, self-updating mechanism of models, and real-time processing in edge computing
resource-limited environments have become the frontier problems of current research
[14].
3. Feature Engineering and Optimization
3.1. Feature Selection Principle
Signature selection is the starting point of signature engineering, and its core is
to identify which information is critical to distinguish between normal and malicious
encrypted traffic. In this study, feature selection followed the following principles:
Statistical attributes and temporal behavior characteristics: Statistical characteristics
such as packet size distribution (mean, median, mode, skewness, and kurtosis), connection
duration, and time between packets can summarize the basic characteristics of traffic.
Temporal features, such as autocorrelation coefficients, rate of change of traffic
within sliding windows, and periodicity metrics, capture the dynamics of traffic over
time and are particularly critical for identifying covert attacks [15].
3.2. Feature Set Construction
Traditional statistical features provide a macro perspective for understanding the
basic properties of network traffic. First, the Mean Packet Size is calculated by
Eq. (1), where $N$ represents the size of a single packet and $N$ is the total number of
packets, reflecting the size of the overall data transmission. Then, Standard Deviation,
which describes the fluctuation range of packet size using Eq. (2), reveals the stability of transmission. In addition, Inter-packet Time Interval (IAT)
analysis, which uses time series analysis techniques to capture the rhythm of communication
patterns by calculating the difference in arrival times of adjacent packets, is critical
to identifying abnormal traffic patterns [16].
To uncover the dynamic behavior patterns of encrypted traffic, it is critical to employ
more advanced analytical techniques that not only capture subtle variations in time
series, but also reveal underlying periodicity and complex dependencies. Among them,
Autocorrelation Function (ACF) as a key tool in time series analysis, through its
exquisite calculation formula (3).
By calculating the correlation between different points in the sequence, this formula
reveals the interdependence of data points in time, especially can effectively identify
periodic patterns, which shows high sensitivity and recognition ability for detecting
malicious activities that follow specific rules.
The subtle use of sliding window statistics further enhances the ability to react
instantly to time series. Specifically, by setting a sliding window with a fixed size
to move gradually along the time series, statistical analysis is performed on the
data in each window, such as calculating the average value and standard deviation
of the window, specifically as formulas (4) and (5), which can reflect the instantaneous change and trend of the data in real time. This
dynamic analysis is particularly effective at capturing transient but highly indicative
patterns of anomalies, such as brief spikes in traffic or unusual lulls, which are
key signals that are difficult to detect with traditional static analysis [17,
18].
3.3. Feature Optimization and Selection
Feature optimization and selection is a key step in encrypted traffic analysis. It
aims to reduce the feature set to remove redundancy, improve the efficiency of the
model and enhance the interpretation ability, while maintaining or improving the accuracy
of identifying malicious behavior. This process relies on the in-depth application
of information theory methods and feature importance evaluation techniques to screen
out the most effective feature sets in a scientific and systematic manner. The specific
flow framework is shown in Fig. 2.
Fig. 2. Feature optimization and selection.
Information gain, as a powerful tool to measure the value of features, its formula
is $IG(A,C) = H(C) - H(C | A)$, where $H$ stands for entropy, is used to quantify
the uncertainty of classification label $C$, and $H(C) | A)$ is the conditional entropy
of $C$ given characteristic $A$. The information gain reveals the degree of uncertainty
reduction about classification $C$ after the introduction of feature $A$, and the
larger the gain, the stronger the feature discrimination ability [19].
At the feature selection strategy level, in order to optimize the feature set efficiently,
different methods are adopted in parallel: recursive reduction method gradually eliminates
the features with the minimum information gain contribution until reaching the preset
threshold or the optimal model performance; wrapping method, which is embodied as
the iterative process of greedy algorithm, optimizes the overall evaluation index
of the whole feature set by directly adding or deleting features; The embedding rule
skillfully integrates feature selection into the model training process, for example,
random forest uses feature importance score to indirectly guide the evaluation of
feature value, and realizes the synchronization of feature optimization and model
learning. These three strategies complement each other and work together to build
the best feature set for the task [20,
21].
Finally, cross-validation is used as the gold standard evaluation strategy, e.g.,
Eq. (6), where $k$ represents the test error of the $i$th fold and $k$ is the fold number,
to comprehensively evaluate the performance of different feature subsets. In this
way, the stable and optimal feature set on different data slices can be objectively
selected to ensure the generalization ability of the model on unknown data, thus improving
the accuracy and reliability of encryption traffic analysis.
4. Multi-model Fusion and Adaptive Learning Mechanism
4.1. Model Selection and Pretreatment
The detailed model framework of this paper is shown in Fig. 3. Specifically, we fuse Gradient Boosting Trees (GBT), Convolutional Neural Networks
(CNN), and online learning and feedback adjustment mechanisms to achieve efficient
feature extraction, iterative optimization, and dynamic adaptation for complex data.
Gradient Boosting Trees (GBT) are a powerful tool in this field, especially for nonlinear
separable problems. BT builds a series of weak learners (usually decision trees) step
by step, each step focusing on correcting errors in the prediction of the previous
step. Specifically, each new model focuses on learning samples that the previous model
failed to classify correctly, iteratively updating model parameters by minimizing
the loss function. This process can be summarized in Eq. (7)
[22,
23].
where is the prediction of the $m$th model, is the newly added weak learner, $L$ is
the loss function, is the learning rate, and the optimization process aims to minimize
the overall loss of the model.
Convolutional Neural Networks (CNNs) show extraordinary potential in traffic analysis,
especially in identifying patterns and sequence features in traffic. CNN uses convolutional
layers to identify local features, reduces the number of parameters by sharing weights
and spatial invariance, and improves the efficiency of the model. For traffic data,
CNN can extract meaningful features from the temporal arrangement of data packets,
such as specific protocol patterns or abnormal communication patterns. Its core convolution
operation can be expressed as Eq. (8). Here, denotes the output feature map, is the convolution kernel, denotes the convolution
operation, is the bias, and $f$ is the activation function by which CNN is able to
capture spatial and temporal series features in the data [24].
Pre-processing steps are also indispensable, including data cleaning, normalization,
normalization, and missing value processing to ensure that the model can learn effectively.
For example, data normalization improves model convergence speed and stability by
subtracting the mean divided by the standard deviation, which is Eq. (9), to fit the data into a normal distribution [25].
To sum up, model selection and preprocessing are the key to building an efficient
encrypted traffic analysis system. Through iterative optimization of gradient lifting
tree and spatial-temporal feature extraction of CNN, combined with appropriate preprocessing
strategy, the recognition ability of the model can be greatly improved, providing
a solid foundation for security protection.
4.2. Model Fusion Strategy
In the complex environment of encrypted traffic analysis, it is often difficult for
a single model to adequately capture all types of attack behavior and patterns. Therefore,
model fusion strategy becomes a key means to improve prediction performance, robustness
and generalization ability. Ensemble learning method refers to combining the prediction
results of multiple basic models to achieve better results than single model through
diversity and complementarity. Weighted averaging is one of the most intuitive ways
to fuse, by assigning a weight to each model and then calculating a weighted average
prediction. If there are $M$ models, their prediction values are respectively $\hat{y}_1$,
$\hat{y}_2$, ..., $\hat{y}_M$, and the weights are $w_1$, $w_2$, ..., $w_M$, then
the final prediction is Eq. (10). Each of them is equal in weight and weight [26].
We adopt Boosting as the ensemble learning framework, which gradually strengthens
the model by iteratively training weak classifiers, each time focusing on the samples
misclassified in the previous round. Weight assignment and result fusion further refine
the strategy of model integration. In practice, in addition to evenly assigning weights,
weights are often dynamically adjusted based on the performance of the model to optimize
overall performance. Stacking is an advanced fusion strategy that learns the output
of the base model by training a meta-model (usually logistic regression or linear
regression), i.e. Eq. (11)
[28].
where is the prediction probability of the metamodel for class $c$, based on the outputs
of all base models.
Results fusion can also be achieved by voting mechanisms, including hard voting (direct
selection of the category with the most votes) and soft voting (probability averaging),
which are effective in classification tasks. For example, the formula for soft voting
can be expressed as Eq. (12), where is the prediction probability of the $m$th model for class $c$.
In summary, through well-designed ensemble learning methods and results fusion strategies,
we can effectively integrate the predictive capabilities of multiple models, not only
improving the accuracy and robustness of encrypted traffic analysis, but also adapting
to a wider range of security challenges to ensure reliable protection of network environments.
4.3. Adaptive Learning Mechanism
Online learning means that the model updates its weights and parameters in real time
as it receives new data, without having to retrain the entire model. This requires
algorithms with low-latency update capability to respond quickly to new samples. The
update formula for online learning can be expressed as Eq. (13)
[29].
where is the model parameter at time $t$, is the learning rate, is the gradient quantity
(such as gradient) based on the current sample, and this formula embodies the iterative
update logic of parameters to ensure that the model evolves in real time with new
data streams $\eta$.
Feedback adjustment is a key component of adaptive learning. It collects model prediction
errors or misclassification instances through performance monitoring and evaluation
mechanism, and feeds back to model adjustment strategy. The adjustment process may
involve small learning rate decay, regularization increase or feature selection optimization.
The specific adjustment formula varies according to the strategy, as shown in Eq.
(14).
Here $L$ is the loss function, $R$ is the regularization term, reflecting feedback
adjustment, acting on parameter update together, promoting the absorption and error
correction of the model to adapt to new knowledge.
The model update and optimization strategy focuses on how to efficiently utilize new
data throughout the model lifecycle, either through incremental learning, periodic
retraining, or model fusion of old and new models. For example, a simplified version
of the update formula for incremental learning may be Eq. (15)
[30].
$\delta w$ represents parameter increments calculated based on new data sets. In this
way, the model retains historical learning while incorporating new knowledge, reducing
the computational burden.
To sum up, the adaptive learning mechanism endows the encryption traffic analysis
system with the ability of continuous evolution through online learning and feedback
adjustment, model update and optimization strategy, so that it can maintain high efficiency
and accuracy in the face of changing threat environment, and adjust the strategy in
time to ensure the long-term effectiveness and reliability of network protection.
The implementation of this series of policies not only improves the real-time nature
of analysis, but also brings more dynamic and proactive defense mechanisms to the
field of network security.
5. Comprehensive Performance Evaluation and Dynamic Adjustment
5.1. Performance Assessment Framework
In the field of encryption traffic analysis, building efficient and reliable models
is not just about high prediction accuracy, but requires a comprehensive and detailed
performance evaluation framework to ensure the effectiveness, efficiency and sustainability
of the model in practical applications. The framework should cover multiple dimensions,
including but not limited to predictive performance metrics, resource consumption
estimates, and processing speed considerations for the model. These key assessment
elements are elaborated below.
Accuracy is the most intuitive evaluation index, defined as the proportion of the
number of correctly classified samples to the total number of samples. However, in
category-unbalanced datasets, accuracy can be misleading because it does not account
for unequal proportions of positive and negative samples. The F1 score is a harmonic
average of accuracy and recall and is intended to provide a single metric that balances
both. F1 score can reflect the comprehensive performance of the model more accurately
when the sample is unbalanced. A higher F1 score indicates that the model has found
a better balance between accuracy and completeness. Resource consumption includes
hardware resource requirements during model training and prediction, such as CPU,
GPU usage, memory footprint, and disk space. In actual deployment, resource efficiency
directly affects system scalability and cost-effectiveness. For example, model training
can require significant computational resources and time, so reducing training time,
optimizing memory usage, and reducing power consumption are important considerations.
The processing speed, or inference time of the model, is the time required for the
model to make predictions on a single sample or batch of samples. In real-time analytics
scenarios, rapid response is the deciding factor, and low latency helps identify and
respond to security threats instantly. Slow processing speed not only affects the
user experience, but may also cause delays in responding to security incidents, thereby
expanding the impact of attacks.
5.2. Experimental Design and Analysis of Results
5.2.1 Data set description and experimental setup
This evaluation uses a comprehensive encrypted traffic dataset that covers a wide
range of normal network activity and encrypted malicious behavior examples, providing
rich learning material for the model. The data is evenly split into a training set
(70%), validation set (15%), and test set (15%) to ensure completeness and accuracy
of model training, tuning, and performance testing. The experiment relies on Python
platform, utilizes tools such as Scikit-Learn, TensorFlow and PyTorch, and is executed
on high-performance computing facilities to ensure efficient operation of the experiment.
5.2.2 Model performance comparison and case study
To further explore the advantages of the proposed scheme, we designed comparative
experiments, including not only the performance of GBT and CNN single models, but
also the ensemble learning models with different configurations (weighted average
fusion, Bagging, Boosting and Stacking), and compared them with several advanced models
in the current field (such as XGBoost, LightGBM and ResNet). The evaluation dimensions
were extended to accuracy, recall and F1 score. The resource consumption and real-time
processing ability of the model were also investigated. The following is a detailed
overview of the experimental results, including the new configuration and metamodel
options.
Table 1 shows how the different models performed in the performance evaluation. The GBT model
has high accuracy and recall of 0.89 and 0.87 respectively, and F1 score of 0.88.
The CNN model has an accuracy of 0.92, recall of 0.86, and F1 score of 0.89. The XGBoost
model has an accuracy of 0.91, recall of 0.88, and F1 score of 0.89.
Table 1. Performance evaluation of single model and comparative model.
|
Model type
|
Accuracy
|
Recall rate
|
F1 score
|
|
GBT
|
0.89
|
0.87
|
0.88
|
|
CNN
|
0.92
|
0.86
|
0.89
|
|
XGBoost
|
0.91
|
0.88
|
0.89
|
|
LightGBM
|
0.90
|
0.89
|
0.89
|
|
ResNet
|
0.93
|
0.85
|
0.89
|
Table 2 shows the performance of the weighted average fusion model with multiple weight configurations.
With a weight configuration of GBT 0.4, CNN 0.6, the model has an accuracy of 0.93,
recall of 0.90, and F1 score of 0.91. With a weight configuration of GBT 0.5, CNN
0.5, the model has an accuracy of 0.92, recall of 0.89, and F1 score of 0.90. With
the weight configuration GBT 0.3, CNN 0.7, the model has an accuracy of 0.91, recall
of 0.91, and F1 score of 0.91. These results demonstrate the impact of different weight
configurations on model performance.
Table 2. Performance of weighted average fusion model with multiple weight configurations.
|
Weight configuration
|
Accuracy
|
Recall rate
|
F1 score
|
|
GBT 0.4, CNN 0.6
|
0.93
|
0.90
|
0.91
|
|
GBT 0.5, CNN 0.5
|
0.92
|
0.89
|
0.90
|
|
GBT 0.3, CNN 0.7
|
0.91
|
0.91
|
0.91
|
Table 3 shows the performance of the Boosting model for different iterations. When the number
of iterations is 25, the accuracy of the model is 0.92, the recall is 0.90, and the
F1 score is 0.91. When the number of iterations is 50, the accuracy of the model is
0.94, the recall is 0.92, and the F1 score is 0.93. When the number of iterations
is 75, the accuracy of the model is 0.94, the recall is 0.93, and the F1 score is
0.93. These results demonstrate the effect of different iteration numbers on model
performance.
Table 3. Boosting model performance for different iterations.
|
Number of iterations
|
Accuracy
|
Recall rate
|
F1 score
|
|
25
|
0.92
|
0.90
|
0.91
|
|
50
|
0.94
|
0.92
|
0.93
|
|
75
|
0.94
|
0.93
|
0.93
|
Table 4 shows the performance of the multivariate model Stacking. The LR model has an accuracy
of 0.95, recall of 0.94, and F1 score of 0.94. The SVM model has an accuracy of 0.94,
recall of 0.93, and F1 score of 0.93. The Random Forest model has an accuracy of 0.95,
recall of 0.93, and F1 score of 0.94. These results demonstrate the performance of
different metamodels in Stacking.
Table 4. Stacking performance of multivariate models.
|
Meta-model
|
Accuracy
|
Recall rate
|
F1 score
|
|
LR
|
0.95
|
0.94
|
0.94
|
|
SVM
|
0.94
|
0.93
|
0.93
|
|
Random Forest
|
0.95
|
0.93
|
0.94
|
Fig. 4. Resource consumption and processing speed assessment.
Fig. 4 shows how different models estimate resource consumption and processing speed. The
GBT model has an average training time of 120 minutes, an average prediction time
of 3.5 ms/sample, and a memory footprint of 2.1 GB. The CNN model has an average training
time of 300 minutes, an average prediction time of 2.0 ms/sample, and a memory footprint
of 4.5 GB. The Stacking (LR) model has an average training time of 450 minutes, an
average prediction time of 4.2 ms/sample, and a memory footprint of 6.2 GB. These
results show the differences in resource consumption and processing speed between
the different models.
6. CONCLUSION
A comprehensive and effective solution to encryption traffic analysis is proposed,
which mainly includes feature engineering and optimization, multi-model fusion and
adaptive learning mechanism. Firstly, in the feature engineering phase, the principles
of feature selection are clarified, including the importance of statistical attributes
and temporal behavior features, and the ability of features to reveal subtle differences
is highlighted for covert attack behavior recognition requirements. In addition to
traditional statistical features, autocorrelation function and sliding window statistics
are introduced to capture the dynamic behavior patterns of traffic. In the feature
optimization and selection phase, information gain and mutual information are used
to select the most effective feature set through recursive reduction, wrapping and
embedding strategies. In the aspect of model fusion, this study discusses the implementation
details of ensemble learning method, weight assignment and result fusion, and establishes
an adaptive learning mechanism by combining online learning and feedback adjustment.
By using ensemble learning method, the prediction results of multiple basic models
are combined, and the effect is better than that of a single model through diversity
and complementarity. Weight assignment and result fusion further refine the strategy
for model integration by assigning a weight to each model and then calculating a weighted
average prediction value. The adaptive learning mechanism endows the encryption traffic
analysis system with the ability of continuous evolution through online learning and
feedback adjustment, model update and optimization strategy, so that it can cope with
changing network threats. Experimental results show that the proposed method performs
well in accuracy, recall and F1 score, which verifies its effectiveness in malicious
traffic identification.
By implementing our approach, network administrators can significantly improve their
ability to detect attacks hidden in encrypted traffic. Specifically, our approach
can more accurately identify malicious behaviors such as DDoS attacks, APT attacks,
ransomware propagation, and data leakage. This not only improves the overall security
of the system, but also reduces false positives and false negatives, allowing network
administrators to respond and handle potential threats more quickly. In addition,
the adaptive learning mechanism ensures that the system can continuously optimize
and adapt to new attack patterns, thereby providing long-term security.
Compared with the current industry standards, our approach demonstrates significant
advantages in multiple aspects. First, by introducing advanced feature extraction
techniques such as autocorrelation functions and sliding window statistics, we are
able to capture more complex traffic behavior patterns, thereby improving detection
accuracy. Second, the feature optimization and selection process uses information
gain and mutual information to ensure that the most effective feature set is selected,
reducing redundancy and noise. Finally, the combination of ensemble learning methods
and adaptive learning mechanisms makes our system more stable and robust when processing
large-scale and highly complex encrypted traffic. Experimental results show that our
approach outperforms existing single-model solutions in accuracy, recall, and F1 scores,
providing the industry with more advanced security protection measures.
References
Zhou K. , Wang W. Y. , Wu C. H. , Hu T. , 2020, Practical evaluation of encrypted
traffic classification based on a combined method of entropy estimation and neural
networks, ETRI Journal, Vol. 42, No. 3, pp. 311-323

Ma C. C. , Du X. H. , Cao L. F. , 2020, Improved KNN algorithm for fine-grained
classification of encrypted network flow, Electronics, Vol. 9, No. 2

Belmoukadam O. , Barakat C. , 2021, Unveiling the end-user viewport resolution
from encrypted video traces, IEEE Transactions on Network and Service Management,
Vol. 18, No. 3, pp. 3324-3335

Zang X. D. , Gong J. , Wang M. L. , Gao P. , Zhang G. W. , 2023, IP traffic
behavior characterization via semantic mining, Journal of Network and Computer Applications,
Vol. 213, pp. 103603

Zeng X. M. , Chen X. S. , Shao G. L. , He T. , Han Z. H. , Wen Y. , Wang
Q. X. , 2019, Flow context and host behavior based Shadowsocks's traffic identification,
IEEE Access, Vol. 7, pp. 41017-41032

Bhardwaj S. , Dave M. , 2023, Enhanced neural network-based attack investigation
framework for network forensics: Identification, detection, and analysis of the attack,
Computers & Security, Vol. 135, pp. 103521

Canavese D. , Regano L. , Basile C. , Ciravegna G. , Lioy A. , 2022, Encryption-agnostic
classifiers of traffic originators and their application to anomaly detection, Computers
& Electrical Engineering, Vol. 97

Papadogiannaki E. , Ioannidis S. , 2021, Acceleration of intrusion detection in
encrypted network traffic using heterogeneous hardware, Sensors, Vol. 21, No. 4

Hao L. P. , Ma Y. H. , 2023, Spoofing traffic attack recognition algorithm for
wireless communication networks in a smart city based on improved machine learning,
Journal of Testing and Evaluation

Sarfaraz A. , Khan A. , 2018, Feature selection based correlation attack on HTTPS
secure searching, Wireless Personal Communications, Vol. 103, No. 4, pp. 2995-3008

Liu C. , Xiong G. , Gou G. P. , Yiu S. M. , Li Z. , Tian Z. H. , 2021,
Classifying encrypted traffic using adaptive fingerprints with multi-level attributes,
World Wide Web-Internet and Web Information Systems, Vol. 24, No. 6, pp. 2071-2097

Yang L. M. , Fu S. J. , Zhang X. Y. , Guo S. Z. , Wang Y. J. , Yang C.
, 2022, FlowSpectrum: A concrete characterization scheme of network traffic behavior
for anomaly detection, World Wide Web-Internet and Web Information Systems, Vol. 25,
No. 5, pp. 2139-2161

Patil P. N. , Ross K. C. , Boyles S. D. , 2021, Convergence behavior for traffic
assignment characterization metrics, Transportmetrica A: Transport Science, Vol. 17,
No. 4, pp. 1244-1271

Feng W. , Ding X. F. , 2021, A congestion attack behaviour recognition method
for wireless sensor networks based on a decision tree, International Journal of Sensor
Networks, Vol. 36, No. 4, pp. 236-242

Abideen M. Z. ul , Saleem S. , Ejaz M. , 2019, VPN traffic detection in SSL-protected
channel, Security and Communication Networks, Vol. 2019

Bai H. W. , Liu W. W. , Liu G. J. , Dai Y. W. , Huang S. H. , 2021, Application
behavior identification in DNS tunnels based on spatial-temporal information, IEEE
Access, Vol. 9, pp. 80639-80653

Liu J. Y. , Wang L. T. , Hu W. , Gao Y. T. , Cao Y. F. , Lin B. J. , Zhang
R. , 2023, Spatial-temporal feature with dual-attention mechanism for encrypted malicious
traffic detection, Security and Communication Networks, Vol. 2023

Kattadige C. , Choi K. N. , Wijesinghe A. , Nama A. , Thilakaranthna K.
, Seneviratne S. , Jourjon G. , 2021, SETA++: Real-time scalable encrypted traffic
analytics in multi-Gbps networks, IEEE Transactions on Network and Service Management,
Vol. 18, No. 3, pp. 3244-3259

Niktabe S. , Lashkari A. H. , Roudsari A. H. , 2024, Unveiling DoH tunnel: Toward
generating a balanced DoH encrypted traffic dataset and profiling malicious behavior
using inherently interpretable machine learning, Peer-to-Peer Networking and Applications,
Vol. 17, No. 1, pp. 507-531

Zhang K. , Deng M. J. , Gong B. , Miao Y. B. , Ning J. T. , 2024, Privacy-preserving
traceable encrypted traffic inspection in blockchain-based industrial IoT, IEEE Internet
of Things Journal, Vol. 11, No. 2, pp. 3484-3496

Abu Al-Haija Q. , Al-Badawi A. , 2022, Attack-aware IoT network traffic routing
leveraging ensemble learning, Sensors, Vol. 22, No. 1

Zheng X. C. , Li H. , 2023, Identification of malicious encrypted traffic through
feature fusion, IEEE Access, Vol. 11, pp. 80072-80080

dos Santos B. V. , Vergütz A. , Macedo R. T. , Nogueira M. , 2023, A dynamic
method to protect user privacy against traffic-based attacks on smart home, Ad Hoc
Networks, Vol. 149

Caicedo-Muñoz J. A. , Espino A. L. , Corrales J. C. , Rendón A. , 2018, QoS-classifier
for VPN and non-VPN traffic based on time-related features, Computer Networks, Vol.
144, pp. 271-279

Zhang X. Q. , Zhao M. , Wang J. Y. , Li S. , Zhou Y. , Zhu S. N. , 2022,
Deep-forest-based encrypted malicious traffic detection, Electronics, Vol. 11, No.
7

Yang J. , Lim H. , 2021, Deep learning approach for detecting malicious activities
over encrypted secure channels, IEEE Access, Vol. 9, pp. 39229-39244

Hong Y. P. , Li Q. , Yang Y. Q. , Shen M. , 2023, Graph-based encrypted malicious
traffic detection with hybrid analysis of multi-view features, Information Sciences,
Vol. 644

Lin P. , Ye K. J. , Hu Y. S. , Lin Y. Y. , Xu C. Z. , 2023, A novel multimodal
deep learning framework for encrypted traffic classification, IEEE/ACM Transactions
on Networking, Vol. 31, No. 3, pp. 1369-1384

Bader O. , Lichy A. , Dvir A. , Dubin R. , Hajaj C. , 2024, OSF-EIMTC: An
open-source framework for standardized encrypted internet traffic classification,
Computer Communications, Vol. 213, pp. 271-284

Kato H. , Haruta S. , Sasase I. , 2020, Android malware detection scheme based
on level of SSL server certificate, IEICE Transactions on Information and Systems,
Vol. E103-D, No. 2, pp. 379-389

Bhati B. S. , Rai C. S. , 2020, Analysis of support vector machine-based intrusion
detection techniques, Arabian Journal for Science and Engineering, Vol. 45, No. 4,
pp. 2371-2383

Bhati B. S. , Dikshita , Bhati N. S. , Chugh G. , 2022, A comprehensive study
of intrusion detection and prevention systems, Wireless Communication Security, pp.
115-142

Weijie Song was born in Shaoguan, Guangdong, China in 1988. He obtained a bachelor's
degree from Guangdong University of Technology. Now, he works at Zhuhai Power Supply
Bureau. Mainly engaged in network security work.
Zufeng Hou was born in Shaoguan, Guangdong, China in 1985. He obtained a master's
degree from Guangdong University of Technology. Now, he works at Zhuhai Power Supply
Bureau. Mainly engaged in the management of network security technology for power
monitoring systems.
Sixiao Guo was born in Guangdong, China in 1988. She obtained a graduate degree in
science from Longbiao University. She is currently working at Zhuhai Power Supply
Bureau, mainly engaged in information project management and local area network management.
Zhige Liao was born in 1986 in Yingde, Guangdong, China. He obtained his undergraduate
degree from Zhuhai University of Beijing Institute of Technology. Now, she works at
the Zhuhai Power Supply Bureau of Guangdong Power Grid Company. Mainly engaged in
IT operation and maintenance management, digital innovation work.
Jiadong Yan was born in Yueyang, Hunan, China in 1991. He obtained an undergraduate
degree from Chongqing University and a graduate degree from Southwest Jiaotong University.
Now, he works at Zhuhai Power Supply Bureau. Mainly engaged in the automation of main
network scheduling and network security work.