Mobile QR Code QR CODE

2025

Reject Ratio

81.5%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 15, No. 2, p.272-283

ISSN (online) :

2287-5255

Received : 16 July 2024Revised : 15 October 2024Accepted : 31 January 2025

DOI :

https://doi.org/10.5573/IEIESPC.2026.15.2.272

Regular Paper

SSL Encryption Traffic Attack Behavior Recognition Method Based on Traffic Behavior Characteristics

Weijie Song¹^* Zufeng Hou¹ Sixiao Guo¹ Zhige Liao¹ Jiadong Yan¹

(Guangdong Power Grid Co., Ltd. Zhuhai Power Supply Bureau, Zhuhai 519075, Guangdong, China)

^* Corresponding Author: Weijie Song, song_weijie@hotmail.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

In recent years, cyberattackers have increasingly exploited SSL/TLS encrypted traffic to hide their attacks, including but not limited to distributed denial of service (DDoS) attacks, malware propagation, data theft, and botnet control. Traditional content-based security detection methods are ineffective against encrypted traffic, as they cannot directly analyze the content, posing a serious threat to the safe and stable operation of network systems. To address this challenge, we propose a method to identify SSL encryption traffic attack behavior based on traffic behavior characteristics. Our approach introduces advanced statistical features, such as autocorrelation functions and sliding window statistics, to capture the dynamic behavior patterns of encrypted traffic. In the feature optimization and selection phase, we use information gain and mutual information to select the most effective feature set through recursive reduction, wrapping, and embedding strategies. For model fusion, we discuss ensemble learning methods, detailing the weight assignment and result fusion processes, and establish an adaptive learning mechanism by combining online learning and feedback adjustment. We evaluate the prediction performance, resource consumption, and processing speed of our model using a comprehensive performance evaluation framework. The experimental part of this study uses a comprehensive encrypted traffic dataset, covering a wide range of normal network activities and encrypted malicious behavior examples. Experimental results show that single models such as GBT, CNN, XGBoost, LightGBM, and ResNet perform well in terms of accuracy, recall, and F1 score. The performance of the weighted average fusion model with multiple weight configurations is further improved, demonstrating the impact of different weight configurations on model performance. Additionally, the Boosting model performance improves with increasing iteration numbers, highlighting the effect of iteration numbers on model performance. Our findings provide a robust and efficient solution for detecting and mitigating SSL/TLS encrypted traffic attacks, enhancing the overall security and stability of network systems. This research is significant because it addresses a critical gap in current cybersecurity practices and offers a practical approach to securing encrypted traffic. Experimental results show that the proposed method performs well in terms of precision, recall and F1 score, outperforming single models. Our research provides network administrators with powerful tools to effectively detect and block malicious activities in encrypted traffic without sacrificing privacy.

Keywords

Traffic behavior characteristics, SSL, Encrypted traffic, Attack behavior, Identification method

1. Introduction

As digital transformation accelerates, cyberspace has become an integral part of critical national infrastructure, business operations and personal life. With the widespread application of emerging technologies such as cloud computing, Internet of Things, and big data, the scale and complexity of network traffic has increased exponentially, and the network environment has become more open and interconnected. Although this highly interconnected nature greatly promotes information sharing and business innovation, it also opens the door for cyber attackers, making cybersecurity threats increasingly serious ^[1].

Especially in the past few years, the popularity of SSL/TLS encryption protocol has become a double-edged sword in the field of network security. On the one hand, the widespread deployment of encryption technology effectively improves the privacy and integrity of data transmission, protects user privacy, and combats security threats such as man-in-the-middle attacks. Encryption, on the other hand, has also become a safe haven for malicious actors who exploit the invisibility of encrypted traffic to hide attacks such as distributed denial of service (DDoS) attacks, advanced persistent threats (APT), ransomware propagation, data breaches, and more. These attacks are more difficult to detect and block by traditional security devices under the cover of encryption, because traditional security devices mostly rely on the analysis of plaintext data and are unable to deal with encrypted traffic. With the continuous evolution of encryption standards and technologies, such as TLS 1.3, although the security of encryption has been further enhanced, it also brings new challenges. Higher encryption efficiency and stronger privacy protection mean less observable metadata, which undoubtedly increases the technical difficulty of effectively identifying malicious behavior in encrypted traffic ^[2]. The specific SSL process is shown in Fig. 1.

Fig. 1. SSL process.

As digital transformation accelerates, the widespread use of SSL/TLS encryption has become a double-edged sword, enhancing data privacy but also providing cover for cyber attackers. Traditional security methods struggle to detect malicious activities in encrypted traffic, creating a significant research gap. Our objectives are to develop advanced feature extraction techniques, optimize feature selection, enhance model fusion using ensemble learning, and evaluate the performance of our proposed method. This work aims to provide a robust solution for detecting and mitigating encrypted traffic attacks, improving overall network security.

Faced with these challenges, academia and industry are actively exploring and studying how to efficiently identify and defend against malicious behavior in SSL/TLS encrypted traffic without compromising user privacy. This requires not only a deep understanding of the underlying characteristics of encrypted traffic, but also a combination of advanced data analytics and intelligent algorithms to catch undetectable attack patterns. Therefore, the development of innovative identification methods, which can not only protect user privacy, but also effectively resist attacks in encrypted traffic, has become a leading topic in current network security research, and has great theoretical significance and practical value ^[3].

Due to the complexity and importance of malicious activity identification in encrypted traffic, this study aims to propose a novel identification method based on traffic behavior characteristics. Through in-depth analysis of the behavior patterns of encrypted traffic and extraction of features with identification degree, the detection ability of hidden attack behavior can be effectively improved, thus providing strong support for network security protection system. This research can not only enhance the response speed and accuracy of the network defense system, reduce the false positive and false negative rate, but also help to detect and block potential security threats in time, thus protecting the security of user data and maintaining the stability and health of the network environment.

This research is devoted to designing and implementing an innovative SSL encryption traffic attack behavior recognition system. The core content covers four key aspects: First, by deeply mining the temporal behavior, statistical attributes and connection patterns of SSL/TLS encryption traffic, a set of highly distinctive feature sets are carefully selected and optimized to ensure that the recognition accuracy is improved without sacrificing computational efficiency. Secondly, combining cutting-edge technologies of machine learning and deep learning, such as random forest, gradient lifting tree, convolutional neural network, recurrent neural network, etc., build and optimize models to accurately learn and predict complex attack behavior patterns in encrypted traffic. In addition, the research also involves building real-time monitoring systems to ensure immediate analysis and response to network traffic, as well as establishing a comprehensive performance evaluation framework to verify the effectiveness and advanced level of the proposed methods in practical application scenarios ^[4].

The innovation of this research lies in: firstly, it innovatively integrates traditional statistical features and novel temporal behavior features to create a multi-dimensional feature space, especially strengthening the recognition efficiency of covert attack behavior in encrypted traffic; secondly, it introduces model fusion strategy and adaptive learning mechanism, which not only improves the robustness of the model by integrating the advantages of various algorithms, but also realizes the self-optimization of the model according to network dynamics and new threats.

2. Related Work

2.1. SSL/TLS Protocol and Its Security Features)

SSL/TLS protocol as the cornerstone of modern Internet communication security, its importance is obvious. The core operation of the protocol begins with an exhaustive handshake process that ensures secure key exchange, cipher suite selection (key to determining encryption strength), and server authentication to prevent man-in-the-middle attacks. After the handshake, the negotiated key and algorithm are used to encrypt the transmission to ensure the confidentiality and integrity of the information. Security analysis is key to evaluating the robustness of SSL/TLS protocols. The release of the new version of the protocol marks a significant advance in security, not only eliminating known weaknesses such as outdated encryption algorithms, but also simplifying the handshake process ^[5], ensuring that even if the key is cracked in the future, the historical communication content remains secure, greatly enhancing privacy protection. The efficiency and security advantages of the new handshake protocol are discussed in depth in an analytical paper detailing the improvements of the new mechanism ^[6]. Another study analyzed the security of the protocol through formal methods, and proved the security of the protocol under various attack models, including man-in-the-middle attack, replay attack and key leakage, by using security protocol verification tools, which provided mathematical rigorous verification of the security of the protocol.

Overall, the SSL/TLS protocol has evolved to significantly enhance its ability to withstand security threats. From handshaking mechanisms to forward security features, each improvement aims to ensure secure delivery while improving the user experience and reducing risk. Security is an ongoing process that requires constant attention and updates to protocol security features to adapt to changes in the threat environment.

To address these challenges, we draw on recent advancements in intrusion detection and prevention systems. Bhati and Rai ^[31] conducted an analysis of support vector machine (SVM)-based intrusion detection techniques, highlighting their effectiveness in identifying various types of cyber threats. Their study provides valuable insights into the strengths and limitations of SVMs in detecting intrusions. Additionally, Bhati et al. ^[32] presented a comprehensive study of intrusion detection and prevention systems, covering a wide range of methodologies and their practical applications. These studies underscore the importance of integrating advanced machine learning techniques and robust feature extraction methods to enhance the detection and prevention of cyberattacks. By leveraging these insights, our proposed approach aims to bridge the gap in detecting malicious activities in SSL/TLS encrypted traffic, ensuring a more secure and resilient network environment.

2.2. Encrypted Traffic Analysis Technology

In recent years, encrypted traffic analysis technology has made significant progress in dealing with increasingly complex network attacks, especially in feature engineering, application of deep learning models, and development of automated analysis tools. Recent studies have shown that, by combining deep packet detection techniques with behavioral analysis, researchers can extract features in more dimensions, such as the distribution of time intervals between packets within a stream, statistical features of specific protocol fields, etc. ^[7]. A new method combining deep packet detection and flow level features is proposed, which improves the classification accuracy of encrypted traffic effectively and shows the potential of feature engineering in encrypted traffic analysis. Deep learning models, especially attention mechanisms and the application of graph neural networks (GNNs) to encrypted traffic analysis, bring new perspectives to understanding complex traffic behavior patterns ^[8]. A graph neural network model based on attention mechanism is introduced to improve the accuracy and efficiency of anomaly detection by learning the complex relationship between traffic data. This approach exploits structural characteristics of traffic data to reveal hidden patterns in traffic behavior ^[9]. In order to meet the demand of real-time analysis of massive encrypted traffic, automated analysis tools and platforms have become research hotspots. For example, Sarfaraz ^[10] proposes an automated analysis framework that integrates feature extraction, model training and real-time monitoring, realizing end-to-end automation from data preprocessing to attack detection, greatly improving analysis efficiency and response speed, and reducing labor costs. With the implementation of data protection regulations such as GDPRIV and CCPA, how to analyze encrypted traffic without violating user privacy has become a new direction of research. Liu et al. ^[11] summarizes the strategies and techniques for security analysis while ensuring user privacy, including differential privacy, homomorphic encryption and other methods, providing a compliance path for encrypted traffic analysis.

2.3. Machine Learning and Deep Learning Models

In the broad field of encrypted traffic analysis and network security, the application of machine learning and deep learning techniques is increasingly becoming the backbone of identifying complex attack behaviors. Recent research results not only expand our understanding of these technologies, but also facilitate their effective deployment in practical defense systems. In this paper, researchers extensively reviewed the application of various machine learning algorithms in the field of network security, among which, Support Vector Machine (SVM) has become a powerful tool to distinguish normal and abnormal traffic with its excellent classification ability in high-dimensional space; Random Forest and Gradient Lifting Tree have improved the accuracy and robustness of recognizing complex patterns with their excellent model integration ability. These algorithms demonstrate excellent performance in classification tasks and anomaly detection, providing a solid foundation for network security analysis. Deep learning, especially convolutional neural networks (CNN) and recurrent neural networks (RNN), has shown unprecedented potential for processing complex pattern recognition in encrypted traffic. Yang et al. ^[12] reveals how these deep learning models skillfully capture time-series information while learning deep into hidden features inside encrypted traffic. CNN can automatically extract features from high-dimensional data through multi-layer convolution operation and efficiently analyze the structural patterns of traffic; while RNN and LSTM (Long Short Term Memory Network) and other variants use their unique memory units to not only retain the continuity of time series, but also capture the long-term dependence of traffic behavior, significantly improving the accuracy and fine granularity of attack identification ^[13].

Recent research has also focused on improving the generalization and computational efficiency of models, for example, using lightweight techniques to reduce model size and improve deployment speed; reusing pre-trained models on limited network security data through transfer learning; and integrating learning strategies to combine the strengths of multiple models to improve overall recognition robustness. At the same time, in the face of the continuous evolution of encryption protocols and the complexity of attack methods, how to efficiently extract features under the premise of protecting privacy, self-updating mechanism of models, and real-time processing in edge computing resource-limited environments have become the frontier problems of current research ^[14].

3. Feature Engineering and Optimization

3.1. Feature Selection Principle

Signature selection is the starting point of signature engineering, and its core is to identify which information is critical to distinguish between normal and malicious encrypted traffic. In this study, feature selection followed the following principles:

Statistical attributes and temporal behavior characteristics: Statistical characteristics such as packet size distribution (mean, median, mode, skewness, and kurtosis), connection duration, and time between packets can summarize the basic characteristics of traffic. Temporal features, such as autocorrelation coefficients, rate of change of traffic within sliding windows, and periodicity metrics, capture the dynamics of traffic over time and are particularly critical for identifying covert attacks ^[15].

3.2. Feature Set Construction

Traditional statistical features provide a macro perspective for understanding the basic properties of network traffic. First, the Mean Packet Size is calculated by Eq. (1), where $N$ represents the size of a single packet and $N$ is the total number of packets, reflecting the size of the overall data transmission. Then, Standard Deviation, which describes the fluctuation range of packet size using Eq. (2), reveals the stability of transmission. In addition, Inter-packet Time Interval (IAT) analysis, which uses time series analysis techniques to capture the rhythm of communication patterns by calculating the difference in arrival times of adjacent packets, is critical to identifying abnormal traffic patterns ^[16].

(1)

$ \mu = \frac{\sum_{i=1}^{N} S_i}{N}, $

(2)

$ \sigma = \sqrt{\frac{\sum_{i=1}^{N} (S_i - \mu)^2}{N}}. $

To uncover the dynamic behavior patterns of encrypted traffic, it is critical to employ more advanced analytical techniques that not only capture subtle variations in time series, but also reveal underlying periodicity and complex dependencies. Among them, Autocorrelation Function (ACF) as a key tool in time series analysis, through its exquisite calculation formula (3).

(3)

$ r_k = \frac{\sum_{t=k+1}^{T-k} (x_t - \bar{x})(x_{t+k} - \bar{x})}{\sum_{t=1}^{T} (x_t - \bar{x})^2}. $

By calculating the correlation between different points in the sequence, this formula reveals the interdependence of data points in time, especially can effectively identify periodic patterns, which shows high sensitivity and recognition ability for detecting malicious activities that follow specific rules.

The subtle use of sliding window statistics further enhances the ability to react instantly to time series. Specifically, by setting a sliding window with a fixed size to move gradually along the time series, statistical analysis is performed on the data in each window, such as calculating the average value and standard deviation of the window, specifically as formulas (4) and (5), which can reflect the instantaneous change and trend of the data in real time. This dynamic analysis is particularly effective at capturing transient but highly indicative patterns of anomalies, such as brief spikes in traffic or unusual lulls, which are key signals that are difficult to detect with traditional static analysis ^[17, ^18].

(4)

$ \mu_w = \frac{1}{w} \sum_{i=t}^{t+w-1} x_i, $

(5)

$ \sigma_w^2 = \frac{1}{w} \sum_{i=t}^{t+w-1} (x_i - \mu_w)^2. $

3.3. Feature Optimization and Selection

Feature optimization and selection is a key step in encrypted traffic analysis. It aims to reduce the feature set to remove redundancy, improve the efficiency of the model and enhance the interpretation ability, while maintaining or improving the accuracy of identifying malicious behavior. This process relies on the in-depth application of information theory methods and feature importance evaluation techniques to screen out the most effective feature sets in a scientific and systematic manner. The specific flow framework is shown in Fig. 2.

Fig. 2. Feature optimization and selection.

Information gain, as a powerful tool to measure the value of features, its formula is $IG(A,C) = H(C) - H(C | A)$, where $H$ stands for entropy, is used to quantify the uncertainty of classification label $C$, and $H(C) | A)$ is the conditional entropy of $C$ given characteristic $A$. The information gain reveals the degree of uncertainty reduction about classification $C$ after the introduction of feature $A$, and the larger the gain, the stronger the feature discrimination ability ^[19].

At the feature selection strategy level, in order to optimize the feature set efficiently, different methods are adopted in parallel: recursive reduction method gradually eliminates the features with the minimum information gain contribution until reaching the preset threshold or the optimal model performance; wrapping method, which is embodied as the iterative process of greedy algorithm, optimizes the overall evaluation index of the whole feature set by directly adding or deleting features; The embedding rule skillfully integrates feature selection into the model training process, for example, random forest uses feature importance score to indirectly guide the evaluation of feature value, and realizes the synchronization of feature optimization and model learning. These three strategies complement each other and work together to build the best feature set for the task ^[20, ^21].

(6)

$ CV = \frac{1}{k} \sum_{i=1}^{k} error_i. $

Finally, cross-validation is used as the gold standard evaluation strategy, e.g., Eq. (6), where $k$ represents the test error of the $i$th fold and $k$ is the fold number, to comprehensively evaluate the performance of different feature subsets. In this way, the stable and optimal feature set on different data slices can be objectively selected to ensure the generalization ability of the model on unknown data, thus improving the accuracy and reliability of encryption traffic analysis.

4. Multi-model Fusion and Adaptive Learning Mechanism

4.1. Model Selection and Pretreatment

The detailed model framework of this paper is shown in Fig. 3. Specifically, we fuse Gradient Boosting Trees (GBT), Convolutional Neural Networks (CNN), and online learning and feedback adjustment mechanisms to achieve efficient feature extraction, iterative optimization, and dynamic adaptation for complex data.

Fig. 3. Model framework.

Gradient Boosting Trees (GBT) are a powerful tool in this field, especially for nonlinear separable problems. BT builds a series of weak learners (usually decision trees) step by step, each step focusing on correcting errors in the prediction of the previous step. Specifically, each new model focuses on learning samples that the previous model failed to classify correctly, iteratively updating model parameters by minimizing the loss function. This process can be summarized in Eq. (7) ^[22, ^23].

(7)

$ F_m(x) = F_{m-1}(x) + \rho_m h_m(x), \quad \text{with } h_m = \text{argmin}_{\rho,h} \sum_{i=1}^{N} L(y_i, F_{m-1}(x_i) + \rho h(x_i)), $

where is the prediction of the $m$th model, is the newly added weak learner, $L$ is the loss function, is the learning rate, and the optimization process aims to minimize the overall loss of the model.

Convolutional Neural Networks (CNNs) show extraordinary potential in traffic analysis, especially in identifying patterns and sequence features in traffic. CNN uses convolutional layers to identify local features, reduces the number of parameters by sharing weights and spatial invariance, and improves the efficiency of the model. For traffic data, CNN can extract meaningful features from the temporal arrangement of data packets, such as specific protocol patterns or abnormal communication patterns. Its core convolution operation can be expressed as Eq. (8). Here, denotes the output feature map, is the convolution kernel, denotes the convolution operation, is the bias, and $f$ is the activation function by which CNN is able to capture spatial and temporal series features in the data ^[24].

(8)

$ Z_j = f(W_j * X + b_j). $

Pre-processing steps are also indispensable, including data cleaning, normalization, normalization, and missing value processing to ensure that the model can learn effectively. For example, data normalization improves model convergence speed and stability by subtracting the mean divided by the standard deviation, which is Eq. (9), to fit the data into a normal distribution ^[25].

(9)

$ x' = \frac{x - \mu}{\sigma}. $

To sum up, model selection and preprocessing are the key to building an efficient encrypted traffic analysis system. Through iterative optimization of gradient lifting tree and spatial-temporal feature extraction of CNN, combined with appropriate preprocessing strategy, the recognition ability of the model can be greatly improved, providing a solid foundation for security protection.

4.2. Model Fusion Strategy

In the complex environment of encrypted traffic analysis, it is often difficult for a single model to adequately capture all types of attack behavior and patterns. Therefore, model fusion strategy becomes a key means to improve prediction performance, robustness and generalization ability. Ensemble learning method refers to combining the prediction results of multiple basic models to achieve better results than single model through diversity and complementarity. Weighted averaging is one of the most intuitive ways to fuse, by assigning a weight to each model and then calculating a weighted average prediction. If there are $M$ models, their prediction values are respectively $\hat{y}_1$, $\hat{y}_2$, ..., $\hat{y}_M$, and the weights are $w_1$, $w_2$, ..., $w_M$, then the final prediction is Eq. (10). Each of them is equal in weight and weight ^[26].

(10)

$ \hat{y}_{ensemble} = \sum_{m=1}^{M} w_m \hat{y}_m. $

We adopt Boosting as the ensemble learning framework, which gradually strengthens the model by iteratively training weak classifiers, each time focusing on the samples misclassified in the previous round. Weight assignment and result fusion further refine the strategy of model integration. In practice, in addition to evenly assigning weights, weights are often dynamically adjusted based on the performance of the model to optimize overall performance. Stacking is an advanced fusion strategy that learns the output of the base model by training a meta-model (usually logistic regression or linear regression), i.e. Eq. (11) ^[28].

(11)

$ \hat{y}_{stack} = \text{argmax}_c P(c|\hat{y}_1, \hat{y}_2, ..., \hat{y}_M), $

where is the prediction probability of the metamodel for class $c$, based on the outputs of all base models.

Results fusion can also be achieved by voting mechanisms, including hard voting (direct selection of the category with the most votes) and soft voting (probability averaging), which are effective in classification tasks. For example, the formula for soft voting can be expressed as Eq. (12), where is the prediction probability of the $m$th model for class $c$.

(12)

$ P(c) = \frac{1}{M} \sum_{m=1}^{M} P_m(c|\hat{y}_m). $

In summary, through well-designed ensemble learning methods and results fusion strategies, we can effectively integrate the predictive capabilities of multiple models, not only improving the accuracy and robustness of encrypted traffic analysis, but also adapting to a wider range of security challenges to ensure reliable protection of network environments.

4.3. Adaptive Learning Mechanism

Online learning means that the model updates its weights and parameters in real time as it receives new data, without having to retrain the entire model. This requires algorithms with low-latency update capability to respond quickly to new samples. The update formula for online learning can be expressed as Eq. (13) ^[29].

(13)

$ w_{t+1} = w_t - \eta \nabla_t, $

where is the model parameter at time $t$, is the learning rate, is the gradient quantity (such as gradient) based on the current sample, and this formula embodies the iterative update logic of parameters to ensure that the model evolves in real time with new data streams $\eta$.

Feedback adjustment is a key component of adaptive learning. It collects model prediction errors or misclassification instances through performance monitoring and evaluation mechanism, and feeds back to model adjustment strategy. The adjustment process may involve small learning rate decay, regularization increase or feature selection optimization. The specific adjustment formula varies according to the strategy, as shown in Eq. (14).

(14)

$ w_{t+1} = w_t - \eta_t \frac{\partial L(w_t)}{\partial w_t} + \lambda R(w_t). $

Here $L$ is the loss function, $R$ is the regularization term, reflecting feedback adjustment, acting on parameter update together, promoting the absorption and error correction of the model to adapt to new knowledge.

The model update and optimization strategy focuses on how to efficiently utilize new data throughout the model lifecycle, either through incremental learning, periodic retraining, or model fusion of old and new models. For example, a simplified version of the update formula for incremental learning may be Eq. (15) ^[30].

(15)

$ w_{new} = w_{old} + \delta w. $

$\delta w$ represents parameter increments calculated based on new data sets. In this way, the model retains historical learning while incorporating new knowledge, reducing the computational burden.

To sum up, the adaptive learning mechanism endows the encryption traffic analysis system with the ability of continuous evolution through online learning and feedback adjustment, model update and optimization strategy, so that it can maintain high efficiency and accuracy in the face of changing threat environment, and adjust the strategy in time to ensure the long-term effectiveness and reliability of network protection. The implementation of this series of policies not only improves the real-time nature of analysis, but also brings more dynamic and proactive defense mechanisms to the field of network security.

5. Comprehensive Performance Evaluation and Dynamic Adjustment

5.1. Performance Assessment Framework

In the field of encryption traffic analysis, building efficient and reliable models is not just about high prediction accuracy, but requires a comprehensive and detailed performance evaluation framework to ensure the effectiveness, efficiency and sustainability of the model in practical applications. The framework should cover multiple dimensions, including but not limited to predictive performance metrics, resource consumption estimates, and processing speed considerations for the model. These key assessment elements are elaborated below.

Accuracy is the most intuitive evaluation index, defined as the proportion of the number of correctly classified samples to the total number of samples. However, in category-unbalanced datasets, accuracy can be misleading because it does not account for unequal proportions of positive and negative samples. The F1 score is a harmonic average of accuracy and recall and is intended to provide a single metric that balances both. F1 score can reflect the comprehensive performance of the model more accurately when the sample is unbalanced. A higher F1 score indicates that the model has found a better balance between accuracy and completeness. Resource consumption includes hardware resource requirements during model training and prediction, such as CPU, GPU usage, memory footprint, and disk space. In actual deployment, resource efficiency directly affects system scalability and cost-effectiveness. For example, model training can require significant computational resources and time, so reducing training time, optimizing memory usage, and reducing power consumption are important considerations. The processing speed, or inference time of the model, is the time required for the model to make predictions on a single sample or batch of samples. In real-time analytics scenarios, rapid response is the deciding factor, and low latency helps identify and respond to security threats instantly. Slow processing speed not only affects the user experience, but may also cause delays in responding to security incidents, thereby expanding the impact of attacks.

5.2. Experimental Design and Analysis of Results

5.2.1 Data set description and experimental setup

This evaluation uses a comprehensive encrypted traffic dataset that covers a wide range of normal network activity and encrypted malicious behavior examples, providing rich learning material for the model. The data is evenly split into a training set (70%), validation set (15%), and test set (15%) to ensure completeness and accuracy of model training, tuning, and performance testing. The experiment relies on Python platform, utilizes tools such as Scikit-Learn, TensorFlow and PyTorch, and is executed on high-performance computing facilities to ensure efficient operation of the experiment.

5.2.2 Model performance comparison and case study

To further explore the advantages of the proposed scheme, we designed comparative experiments, including not only the performance of GBT and CNN single models, but also the ensemble learning models with different configurations (weighted average fusion, Bagging, Boosting and Stacking), and compared them with several advanced models in the current field (such as XGBoost, LightGBM and ResNet). The evaluation dimensions were extended to accuracy, recall and F1 score. The resource consumption and real-time processing ability of the model were also investigated. The following is a detailed overview of the experimental results, including the new configuration and metamodel options.

Table 1 shows how the different models performed in the performance evaluation. The GBT model has high accuracy and recall of 0.89 and 0.87 respectively, and F1 score of 0.88. The CNN model has an accuracy of 0.92, recall of 0.86, and F1 score of 0.89. The XGBoost model has an accuracy of 0.91, recall of 0.88, and F1 score of 0.89.

Table 1. Performance evaluation of single model and comparative model.

Model type	Accuracy	Recall rate	F1 score
GBT	0.89	0.87	0.88
CNN	0.92	0.86	0.89
XGBoost	0.91	0.88	0.89
LightGBM	0.90	0.89	0.89
ResNet	0.93	0.85	0.89

Table 2 shows the performance of the weighted average fusion model with multiple weight configurations. With a weight configuration of GBT 0.4, CNN 0.6, the model has an accuracy of 0.93, recall of 0.90, and F1 score of 0.91. With a weight configuration of GBT 0.5, CNN 0.5, the model has an accuracy of 0.92, recall of 0.89, and F1 score of 0.90. With the weight configuration GBT 0.3, CNN 0.7, the model has an accuracy of 0.91, recall of 0.91, and F1 score of 0.91. These results demonstrate the impact of different weight configurations on model performance.

Table 2. Performance of weighted average fusion model with multiple weight configurations.

Weight configuration	Accuracy	Recall rate	F1 score
GBT 0.4, CNN 0.6	0.93	0.90	0.91
GBT 0.5, CNN 0.5	0.92	0.89	0.90
GBT 0.3, CNN 0.7	0.91	0.91	0.91

Table 3 shows the performance of the Boosting model for different iterations. When the number of iterations is 25, the accuracy of the model is 0.92, the recall is 0.90, and the F1 score is 0.91. When the number of iterations is 50, the accuracy of the model is 0.94, the recall is 0.92, and the F1 score is 0.93. When the number of iterations is 75, the accuracy of the model is 0.94, the recall is 0.93, and the F1 score is 0.93. These results demonstrate the effect of different iteration numbers on model performance.

Table 3. Boosting model performance for different iterations.

Number of iterations	Accuracy	Recall rate	F1 score
25	0.92	0.90	0.91
50	0.94	0.92	0.93
75	0.94	0.93	0.93

Table 4 shows the performance of the multivariate model Stacking. The LR model has an accuracy of 0.95, recall of 0.94, and F1 score of 0.94. The SVM model has an accuracy of 0.94, recall of 0.93, and F1 score of 0.93. The Random Forest model has an accuracy of 0.95, recall of 0.93, and F1 score of 0.94. These results demonstrate the performance of different metamodels in Stacking.

Table 4. Stacking performance of multivariate models.

Meta-model	Accuracy	Recall rate	F1 score
LR	0.95	0.94	0.94
SVM	0.94	0.93	0.93
Random Forest	0.95	0.93	0.94

Fig. 4. Resource consumption and processing speed assessment.

Fig. 4 shows how different models estimate resource consumption and processing speed. The GBT model has an average training time of 120 minutes, an average prediction time of 3.5 ms/sample, and a memory footprint of 2.1 GB. The CNN model has an average training time of 300 minutes, an average prediction time of 2.0 ms/sample, and a memory footprint of 4.5 GB. The Stacking (LR) model has an average training time of 450 minutes, an average prediction time of 4.2 ms/sample, and a memory footprint of 6.2 GB. These results show the differences in resource consumption and processing speed between the different models.

6. CONCLUSION

A comprehensive and effective solution to encryption traffic analysis is proposed, which mainly includes feature engineering and optimization, multi-model fusion and adaptive learning mechanism. Firstly, in the feature engineering phase, the principles of feature selection are clarified, including the importance of statistical attributes and temporal behavior features, and the ability of features to reveal subtle differences is highlighted for covert attack behavior recognition requirements. In addition to traditional statistical features, autocorrelation function and sliding window statistics are introduced to capture the dynamic behavior patterns of traffic. In the feature optimization and selection phase, information gain and mutual information are used to select the most effective feature set through recursive reduction, wrapping and embedding strategies. In the aspect of model fusion, this study discusses the implementation details of ensemble learning method, weight assignment and result fusion, and establishes an adaptive learning mechanism by combining online learning and feedback adjustment. By using ensemble learning method, the prediction results of multiple basic models are combined, and the effect is better than that of a single model through diversity and complementarity. Weight assignment and result fusion further refine the strategy for model integration by assigning a weight to each model and then calculating a weighted average prediction value. The adaptive learning mechanism endows the encryption traffic analysis system with the ability of continuous evolution through online learning and feedback adjustment, model update and optimization strategy, so that it can cope with changing network threats. Experimental results show that the proposed method performs well in accuracy, recall and F1 score, which verifies its effectiveness in malicious traffic identification.

By implementing our approach, network administrators can significantly improve their ability to detect attacks hidden in encrypted traffic. Specifically, our approach can more accurately identify malicious behaviors such as DDoS attacks, APT attacks, ransomware propagation, and data leakage. This not only improves the overall security of the system, but also reduces false positives and false negatives, allowing network administrators to respond and handle potential threats more quickly. In addition, the adaptive learning mechanism ensures that the system can continuously optimize and adapt to new attack patterns, thereby providing long-term security.

Compared with the current industry standards, our approach demonstrates significant advantages in multiple aspects. First, by introducing advanced feature extraction techniques such as autocorrelation functions and sliding window statistics, we are able to capture more complex traffic behavior patterns, thereby improving detection accuracy. Second, the feature optimization and selection process uses information gain and mutual information to ensure that the most effective feature set is selected, reducing redundancy and noise. Finally, the combination of ensemble learning methods and adaptive learning mechanisms makes our system more stable and robust when processing large-scale and highly complex encrypted traffic. Experimental results show that our approach outperforms existing single-model solutions in accuracy, recall, and F1 scores, providing the industry with more advanced security protection measures.

References

Zhou K. , Wang W. Y. , Wu C. H. , Hu T. , 2020, Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural networks, ETRI Journal, Vol. 42, No. 3, pp. 311-323

Ma C. C. , Du X. H. , Cao L. F. , 2020, Improved KNN algorithm for fine-grained classification of encrypted network flow, Electronics, Vol. 9, No. 2

Belmoukadam O. , Barakat C. , 2021, Unveiling the end-user viewport resolution from encrypted video traces, IEEE Transactions on Network and Service Management, Vol. 18, No. 3, pp. 3324-3335

Zang X. D. , Gong J. , Wang M. L. , Gao P. , Zhang G. W. , 2023, IP traffic behavior characterization via semantic mining, Journal of Network and Computer Applications, Vol. 213, pp. 103603

Zeng X. M. , Chen X. S. , Shao G. L. , He T. , Han Z. H. , Wen Y. , Wang Q. X. , 2019, Flow context and host behavior based Shadowsocks's traffic identification, IEEE Access, Vol. 7, pp. 41017-41032

Bhardwaj S. , Dave M. , 2023, Enhanced neural network-based attack investigation framework for network forensics: Identification, detection, and analysis of the attack, Computers & Security, Vol. 135, pp. 103521

Canavese D. , Regano L. , Basile C. , Ciravegna G. , Lioy A. , 2022, Encryption-agnostic classifiers of traffic originators and their application to anomaly detection, Computers & Electrical Engineering, Vol. 97

Papadogiannaki E. , Ioannidis S. , 2021, Acceleration of intrusion detection in encrypted network traffic using heterogeneous hardware, Sensors, Vol. 21, No. 4

Hao L. P. , Ma Y. H. , 2023, Spoofing traffic attack recognition algorithm for wireless communication networks in a smart city based on improved machine learning, Journal of Testing and Evaluation

Sarfaraz A. , Khan A. , 2018, Feature selection based correlation attack on HTTPS secure searching, Wireless Personal Communications, Vol. 103, No. 4, pp. 2995-3008

Liu C. , Xiong G. , Gou G. P. , Yiu S. M. , Li Z. , Tian Z. H. , 2021, Classifying encrypted traffic using adaptive fingerprints with multi-level attributes, World Wide Web-Internet and Web Information Systems, Vol. 24, No. 6, pp. 2071-2097

Yang L. M. , Fu S. J. , Zhang X. Y. , Guo S. Z. , Wang Y. J. , Yang C. , 2022, FlowSpectrum: A concrete characterization scheme of network traffic behavior for anomaly detection, World Wide Web-Internet and Web Information Systems, Vol. 25, No. 5, pp. 2139-2161

Patil P. N. , Ross K. C. , Boyles S. D. , 2021, Convergence behavior for traffic assignment characterization metrics, Transportmetrica A: Transport Science, Vol. 17, No. 4, pp. 1244-1271

Feng W. , Ding X. F. , 2021, A congestion attack behaviour recognition method for wireless sensor networks based on a decision tree, International Journal of Sensor Networks, Vol. 36, No. 4, pp. 236-242

Abideen M. Z. ul , Saleem S. , Ejaz M. , 2019, VPN traffic detection in SSL-protected channel, Security and Communication Networks, Vol. 2019

Bai H. W. , Liu W. W. , Liu G. J. , Dai Y. W. , Huang S. H. , 2021, Application behavior identification in DNS tunnels based on spatial-temporal information, IEEE Access, Vol. 9, pp. 80639-80653

Liu J. Y. , Wang L. T. , Hu W. , Gao Y. T. , Cao Y. F. , Lin B. J. , Zhang R. , 2023, Spatial-temporal feature with dual-attention mechanism for encrypted malicious traffic detection, Security and Communication Networks, Vol. 2023

Kattadige C. , Choi K. N. , Wijesinghe A. , Nama A. , Thilakaranthna K. , Seneviratne S. , Jourjon G. , 2021, SETA++: Real-time scalable encrypted traffic analytics in multi-Gbps networks, IEEE Transactions on Network and Service Management, Vol. 18, No. 3, pp. 3244-3259

Niktabe S. , Lashkari A. H. , Roudsari A. H. , 2024, Unveiling DoH tunnel: Toward generating a balanced DoH encrypted traffic dataset and profiling malicious behavior using inherently interpretable machine learning, Peer-to-Peer Networking and Applications, Vol. 17, No. 1, pp. 507-531

Zhang K. , Deng M. J. , Gong B. , Miao Y. B. , Ning J. T. , 2024, Privacy-preserving traceable encrypted traffic inspection in blockchain-based industrial IoT, IEEE Internet of Things Journal, Vol. 11, No. 2, pp. 3484-3496

Abu Al-Haija Q. , Al-Badawi A. , 2022, Attack-aware IoT network traffic routing leveraging ensemble learning, Sensors, Vol. 22, No. 1

Zheng X. C. , Li H. , 2023, Identification of malicious encrypted traffic through feature fusion, IEEE Access, Vol. 11, pp. 80072-80080

dos Santos B. V. , Vergütz A. , Macedo R. T. , Nogueira M. , 2023, A dynamic method to protect user privacy against traffic-based attacks on smart home, Ad Hoc Networks, Vol. 149

Caicedo-Muñoz J. A. , Espino A. L. , Corrales J. C. , Rendón A. , 2018, QoS-classifier for VPN and non-VPN traffic based on time-related features, Computer Networks, Vol. 144, pp. 271-279

Zhang X. Q. , Zhao M. , Wang J. Y. , Li S. , Zhou Y. , Zhu S. N. , 2022, Deep-forest-based encrypted malicious traffic detection, Electronics, Vol. 11, No. 7

Yang J. , Lim H. , 2021, Deep learning approach for detecting malicious activities over encrypted secure channels, IEEE Access, Vol. 9, pp. 39229-39244

Hong Y. P. , Li Q. , Yang Y. Q. , Shen M. , 2023, Graph-based encrypted malicious traffic detection with hybrid analysis of multi-view features, Information Sciences, Vol. 644

Lin P. , Ye K. J. , Hu Y. S. , Lin Y. Y. , Xu C. Z. , 2023, A novel multimodal deep learning framework for encrypted traffic classification, IEEE/ACM Transactions on Networking, Vol. 31, No. 3, pp. 1369-1384

Bader O. , Lichy A. , Dvir A. , Dubin R. , Hajaj C. , 2024, OSF-EIMTC: An open-source framework for standardized encrypted internet traffic classification, Computer Communications, Vol. 213, pp. 271-284

Kato H. , Haruta S. , Sasase I. , 2020, Android malware detection scheme based on level of SSL server certificate, IEICE Transactions on Information and Systems, Vol. E103-D, No. 2, pp. 379-389

Bhati B. S. , Rai C. S. , 2020, Analysis of support vector machine-based intrusion detection techniques, Arabian Journal for Science and Engineering, Vol. 45, No. 4, pp. 2371-2383

Bhati B. S. , Dikshita , Bhati N. S. , Chugh G. , 2022, A comprehensive study of intrusion detection and prevention systems, Wireless Communication Security, pp. 115-142

Weijie Song

Weijie Song was born in Shaoguan, Guangdong, China in 1988. He obtained a bachelor's degree from Guangdong University of Technology. Now, he works at Zhuhai Power Supply Bureau. Mainly engaged in network security work.

Zufeng Hou

Zufeng Hou was born in Shaoguan, Guangdong, China in 1985. He obtained a master's degree from Guangdong University of Technology. Now, he works at Zhuhai Power Supply Bureau. Mainly engaged in the management of network security technology for power monitoring systems.

Sixiao Guo

Sixiao Guo was born in Guangdong, China in 1988. She obtained a graduate degree in science from Longbiao University. She is currently working at Zhuhai Power Supply Bureau, mainly engaged in information project management and local area network management.

Zhige Liao

Zhige Liao was born in 1986 in Yingde, Guangdong, China. He obtained his undergraduate degree from Zhuhai University of Beijing Institute of Technology. Now, she works at the Zhuhai Power Supply Bureau of Guangdong Power Grid Company. Mainly engaged in IT operation and maintenance management, digital innovation work.

Jiadong Yan

Jiadong Yan was born in Yueyang, Hunan, China in 1991. He obtained an undergraduate degree from Chongqing University and a graduate degree from Southwest Jiaotong University. Now, he works at Zhuhai Power Supply Bureau. Mainly engaged in the automation of main network scheduling and network security work.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

SSL Encryption Traffic Attack Behavior Recognition Method Based on Traffic Behavior Characteristics

Abstract

Keywords

1. Introduction

2. Related Work

2.1. SSL/TLS Protocol and Its Security Features)

2.2. Encrypted Traffic Analysis Technology

2.3. Machine Learning and Deep Learning Models

3. Feature Engineering and Optimization

3.1. Feature Selection Principle

3.2. Feature Set Construction

(1)

(2)

(3)

(4)

(5)

3.3. Feature Optimization and Selection

(6)

4. Multi-model Fusion and Adaptive Learning Mechanism

4.1. Model Selection and Pretreatment

(7)

(8)

(9)

4.2. Model Fusion Strategy

(10)

(11)

(12)

4.3. Adaptive Learning Mechanism

(13)

(14)

(15)

5. Comprehensive Performance Evaluation and Dynamic Adjustment

5.1. Performance Assessment Framework

5.2. Experimental Design and Analysis of Results

5.2.1 Data set description and experimental setup

5.2.2 Model performance comparison and case study

6. CONCLUSION

References

Weijie Song

Zufeng Hou

Sixiao Guo

Zhige Liao

Jiadong Yan

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing