Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (Tibet Police College, Lasa, The Tibet Autonomous Region 850000, China wf_wfeng@outlook.com, yongzcmu@hotmail.com)



Accuracy, Convolutional neural network, Digital video, Forgery, Legal

1. Introduction

Digital video forgery is a byproduct of the continuous development and progress of deep learning [1]. False digital videos can be generated by manipulating the target character’s expressions, actions, and language using algorithms [2]. With the widespread availability of internet access, these forged videos are quickly disseminated. Although digital video forgery may assist education and entertainment, it significantly threatens personal privacy and legal security. The increasing accessibility of apps and software that facilitate digital video forgery has made it increasingly challenging for the public to discern the authenticity of digital videos. If misused, these forged videos can lead to various security issues. From a legal perspective, misusing forged digital videos can profoundly impact the legal system if they are used for evidence collection. Therefore, legal regulations explicitly prohibit the creation and dissemination of inauthentic news stories and the use of information to forge the likeness and voices of others. However, more than relying solely on legal regulations is needed to counter the adverse effects of digital video forgery. It is imperative to enhance detection efficiency through technical means [3]. Suratkar et al. [4] proposed a detection method based on a convolutional recursive neural network and demonstrated that the model’s performance can be effectively improved through transfer learning. Agarwal et al. [5] detected forged videos by leveraging the fact that the dynamics of the mouth shape are occasionally inconsistent with a spoken phoneme, and validated the method’s effectiveness through experiments on various types of videos. Pashine et al. [6] analyzed the efficacy of several state-of-the-art neural networks, such as MesoNet and VGG-19, for detecting forged videos. They identified the optimal solution for different scenarios based on the features of these methods. Hao et al. [7] proposed a detection approach based on a 3D convolutional neural network (CNN), which achieved high accuracy in forged video detection. Although some current forgery detection methods can partially address the issue of digital video forgery in practice, there is still potential for further improvement in terms of detection performance to achieve better efficiency and enhance practical application effectiveness. Therefore, this paper deeply analyzed the existing international laws from the perspective of law and made two enhancements on the technical level based on EfficientNet-V2, which is currently widely used and performs well. An improved EfficientNet-V2 method with an improved feature extraction ability was developed by combining convolutional block attention module (CBAM) with Mish function. The effectiveness of this method in detecting digital video forgery was analyzed using different datasets, and it was compared with the existing detection method to verify its detection performance. This study provides a more reliable method for the digital video forgery detection, offers some theoretical support the further improvement and application of EfficientNet-V2, and makes some contributions to the regulation of digital video forgery from perspectives of law and technology.

2. Digital Video Forgery in the Legal Perspective

2.1. The Harm of Digital Video Forgery

With deep learning, fake digital videos are no longer plagued by obvious flaws typically associated with traditional video forgery techniques such as frame deletion or insertion. As a result, detecting these forged videos becomes much more challenging. Deep learning enables the migration of faces and expressions within videos, making them even more challenging to identify. The positivity or negativity of technology depends on the user. While digital video forgery can offer convenience and assistance in areas like film and television production, video restoration, and smart shopping, it can also cause significant harm when used maliciously.

(1) Personal safety

If digital video forgery is used in pornographic videos, it will lead to damage to personal reputation and privacy; if the face is forged to carry out telecommunications fraud, it will lead to the victim not being able to distinguish and suffer property damage.

(2) Social security

If digital video forgery is used to disseminate false news [8], it will lead to a decline in the public’s trust in the media, and the spread of rumors will be more convenient, triggering a crisis of faith in society.

(3) National security

If digital video forgery is used for politicians’ speech falsification, it will threaten national security.

2.2. Relevant Laws on Digital Video Forgery

China: The Network Audio and Video Information Service Administration Regulations explicitly mentions “deep forgery” technology in China. These regulations require service providers to detect and identify non-authentic audio and video information promptly. However, in practice, implementing these regulations can be challenging. Currently, the regulation of digital video forgery in China primarily relies on existing laws, such as the Civil Code, the Criminal Law, and the Personal Information Protection Law. These laws address violations of portrait rights and infringements of citizens’ personal information. However, there are specific dilemmas in implementing these laws, and there is a need for improvement in supervision.

Singapore: The Protection For Online Falseas And Manipment Act has made provisions for forged false audio and video content. The Election (Maintenance of Integrity in Internet Advertising) Regulations prohibit the use of artificially generated technology during elections to splice and synthesize false content involving an individual’s voice, facial expressions, and body movements.

USA: At the federal level, regulations have been proposed to address digital video forgery through acts such as the Malicious Deep Fake Prohibition Act of 2018 and the Deepfakes Accountability Act. The Deepfakes Report Act of 2019 provides a clear definition of “digital content forgery,” while comprehensive protection for personal data is ensured through acts like the Commercial Facial Recognition Privacy Act of 2019 and the Privacy Rights Act.

France: Punishment is imposed through the Penal Code" for "publishing visual or audio content generated or replicated through algorithms, without someone’s consent.

Germany: The Strafgesetzbuch, Brgerliches Gesetzbuch, and other laws have made provisions for the dissemination of false information, cracking down on the fabrication of digital videos. The Gesetz zur Verbesserung der Rechtsdurchsetzung in sozialen Netzwerken also requires websites to delete blatantly illegal information.

UK: The Data Protection Act provides protection for information such as images of personal faces.

Korea: Implementing the Personal Information Protection Act prohibits relevant institutions from collecting and using personal information without the consent of the individuals involved.

Russia: The government manages the fabrication of digital videos from the perspective of online public opinion regulation.

European Union: It implements the General Data Protection Regulation to protect personal data, including citizen images that could be used to create deepfake content. The Code of Practice on Disinformation explicitly states that internet companies are required to regulate and address false information on platforms, including digitally manipulated videos.

3. Forgery Detection Method based on EfficientNet-V2

3.1. EfficientNet

In the face of the inadequacy of the existing laws regulating digital video forgery, this paper investigates the detection method of digital video forgery at the level of technical means. EfficientNet is a new type of CNN [9]. Its principle is to improve the network’s ability in feature extraction by balancing the depth, width, and image resolution of the network. That is to say, the optimal coefficients for network depth, width, and resolution are found out respectively and combined to scale the model as a whole to improve its accuracy. The scalability in depth, width, and resolution allows EfficientNet to obtain a more diverse feature representation and better adapt to different tasks and requirements. EfficientNet uses scaling factor ϕ to balance network accuracy and efficiency. The scaling relationship can be expressed as:

(1)
$ \max_{d,w,r} Accuracy[N(d,w,r)], $
(2)
$ d = \alpha^\phi, $
(3)
$ w = \beta^\phi, $
(4)
$ r = \gamma^\phi, $
(5)
$ s.t. \alpha \cdot \beta^2 \cdot \gamma^2 \approx 2, $
(6)
$ \alpha \ge 1, \beta \ge 1, \gamma \ge 1, $

where α, β, and γ are scaling ratios.

The EfficientNet network contains nine layers; the first layer is a dimensionality-raising convolutional layer. Dimensionality reduction is performed on an input feature map with a size of H × W × C:

(7)
$ F_1(X) = g_u(X_{[r \cdot H_1, r \cdot W_1, W \cdot C_1]}), $
(8)
$ g_u(X) = f(X * k_u + b_u), $

where gu refers to a dimensionality-raising convolution operation, f is an activation function, * is a convolution operation, ku is the weight of the convolutional kernel, and bu is the offset of the convolution kernel.

Layers 2-8 are used for feature extraction through stacked MBConv modules. The calculation formulas are:

(9)
$ F_1(X) = g_d \odot d_{se} \odot g_w \odot g_u(X_{[r \cdot H_1, r \cdot W_1, fw \cdot C_1]}), $
(10)
$ g_w(X) = f(X + k_d + b_d), $
(11)
$ g_{se}(X) = T_2[f_{se}(T_1(Z))]X, $
(12)
$ Z = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{i=1}^{W} x(i, j), $
(13)
$ g_w(X) = f_{dw}[f_{pw}(X)], $

where gd is the dimensionality reduction convolution operation, kd and bd are the weight and offset of the convolution kernel, gse is the squeeze-and-excitation (SE) attention mechanism operation, fse is the mean pooling operation, T1 and T2 are the full connection operations of attention, Z is the feature compression intermediate quantity, gw is the depth-separable convolution operation, fsw is the point-by-point convolution operation, and fsw is the channel-by-channel convolution operation.

The final ninth layer is the output layer, and its calculation formula is:

(14)
$ F_9(X) = g_c \odot g_p \odot g_d(X_{[r \cdot H_9, r \cdot W_9, w \cdot C_9]}), $
(15)
$ g_c(X) = f(WX + b_c), $
(16)
$ g_p(X) = f[\delta * \mu(X) + b_p], $

where gc is the full connection layer operation, W and bc are the weight and offset of the fully connected (FC) layer, gp is the pooling layer operation, δ and bp are the weight and offset of the pooling layer, and µ refers to the pooling approach.

Based on different scaling coefficients, EfficientNet can be divided into eight sub-networks: B0-B7. However, the composition modules of the networks are all the same. Taking EfficientNet-B0 as an example, its structure is shown in Table 1.

Table 1. The structure of EfficientNet-B0.

Stage Operator Resolution Channel Layer
1 Conv3×3 224×224 32 1
2 MBConv1, k3×3 112×112 16 1
3 MBConv6, k3×3 112×112 24 2
4 MBConv6, k5×5 56×56 40 2
5 MBConv6, k3×3 28×28 80 3
6 MBConv6, k5×5 14×14 112 3
7 MBConv6, k5×5 14×14 192 4
8 MBConv6, k3×3 7×7 320 1
9 Conv1×1&Pooling&FC 7×7 1280 1

3.2. Improved EfficientNet-V2

EfficientNet-V2 is a new image classification network [10]. On the basis of EfficientNet, Fused-MBConv is used in the shallow layer, while MBConv is used in the deep layer. Fused-MBConv replaces the dimensionality-raising convolutional layer and depth separable convolutional layer in the original MBConv with a regular Conv3×3 convolution, thereby reducing the number of parameters.The network structure is illustrated in Table 2.

Table 2. EfficientNet-V2 network structure

Module Number of channels Number of layers
1 Conv3×3 24 1
2 Fused-MBConv1, k3×3 24 2
3 Fused-MBConv4, k3×3 48 4
4 Fused-MBConv4, k3×3 64 4
5 MBConv4, k3×3, SE0.25 128 6
6 MBConv6, k3×3, SE0.25 160 9
7 MBConv6, k3×3, SE0.25 256 15
8 Conv1×1&Pooling&FC 1,280 1

The SE attention mechanism used in EfficientNet-V2 can only focus on the channel information. In order to be able to improve the detection of digital video forgery further, this paper combines the convolutional block attention module (CBAM) [11] to improve it. CBAM can focus on the channel and spatial dimension information at the same time to obtain a better feature extraction effect.

(1) Channel attention

(17)
$ M_c(X) = \sigma[MLP(AvgPool(X)) + MLP(MaxPool(X))], $

where σ is the Sigmoid activation function, MLP refers to the FC layer, and AvgPool is the mean pooling, and MaxPool is the maximum pooling operation. This equation mainly focuses on the channel’s vital information. According to the output weight vector, each channel of the feature map is weighted, and unimportant channel features are suppressed to realize the feature map’s compression in the spatial dimension.

(2) Spatial attention

(18)
$ M_s(X) = \sigma[f^{7 \times 7}(AvgPool(X); MaxPool(X))], $

where f7×7 refers to the convolution operation whose filter size is 7 × 7. This equation mainly focuses on the spatially important information. According to the output weight vector, each pixel of the feature map is weighted, and unimportant areas are suppressed to realize the compression of the feature map in the channel dimension.

In addition to using CBAM instead of the SE module, this paper also improves the activation function in EfficientNet-V2. The Swish function is used in the original EfficientNet-V2:

(19)
$ Swish(X) = X \cdot \sigma(\beta X), $

where β is the trainable parameter. This paper uses the smoother Mish function to replace the Swish function to enhance the network accuracy. The Mish function is written as:

(20)
$ Mish(X) = X \cdot \tanh[\ln(1 + e^x)]. $

The MBConv structure of the Mish function is shown in Fig. 1. The improved EfficientNet-V2 flowchart is presented in Fig. 2.

Fig. 1. The improved MBConv.

../../Resources/ieie/IEIESPC.2026.15.2.284/fig1.png

Fig. 2. The flowchart of the improved EfficientNet-V2.

../../Resources/ieie/IEIESPC.2026.15.2.284/fig2.png

4. Results and Analysis

4.1. Experimental Setup

The experiments were performed on a Windows 10 operating system and a memory of 128 GB. The algorithm was implemented using PyTorch 1.7. For training EfficientNet-V2, a batch size of 16 and a learning rate of 0.0001 were used. A total of 50 training iterations were conducted using the stochastic gradient descent (SGD) optimizer. In the digital video forgery detection research, numerous publicly available datasets are used. The datasets utilized in this paper are listed below.

(1) FF++ [12]

The dataset comprises 1,000 genuine videos sourced from YouTube and 4,000 manipulated videos generated using techniques like DeepFakes. In order to simulate image quality loss in the real world, FF++ includes not only uncompressed source videos but also lightly compressed high-quality (HQ) videos and heavily compressed low-quality (LQ) videos.

(2) DFDC [13]

The dataset consists of 1131 authentic videos featuring 66 actors. Additionally, 4,119 forged videos were created using two undisclosed forgery methods. The forged faces in this dataset exhibit a high level of realism.

(3) Celeb-DF [14]

The dataset includes 890 genuine videos sourced from YouTube. Furthermore, 5,639 forged videos were generated using an enhanced DeepFake forgery method.

All the above datasets were divided into a training set and test set according to 7:3 during the experiment. The method of extracting one frame every 30 frames was used to realize the sub-framing. A multi-task convolutional neural network (MTCNN) [15] was employed to realize face detection. The processed images were scaled in equal proportions to complete the preprocessing. The performance of the digital video forgery detection method from the legal perspective was evaluated based on the detection accuracy:

(21)
$ ACC = \frac{TP + TN}{TP + FN + FP + TN}. $

The number of real videos that are detected as real is represented by TP. The number of forged videos that are detected as forged is represented by TN. The number of real videos that are detected as forged is represented by FN. The number of forged videos detected as real is represented as FP.

4.2. Results Analysis

To assess the effectiveness of the proposed digital video forgery detection method from a legal standpoint, the enhancements made to EfficientNet-V2 were evaluated on various datasets. The outcomes of these comparisons are presented in Fig. 3.

As shown in Fig. 3, it is evident that the improved EfficientNet-V2 (EfficientNet-V2+CBAM+Mish) algorithm achieved higher accuracy in detecting digital video forgery across different datasets. Notably, when the LQ dataset from FF++ was used, all algorithms exhibited low detection performance due to the degraded video quality. However, the EfficientNet-V2+CBAM+Mish algorithm achieved the highest accuracy (83.45%), showing a 0.24% improvement over the EfficientNet-V2+CBAM algorithm and a 2.23% improvement over the EfficientNet-V2 algorithm. When the HQ dataset from FF++ was used, the EfficientNet-V2+CBAM+Mish algorithm achieved the highest accuracy (95.21%), marking a 0.15% increase compared to the EfficientNet-V2+CBAM algorithm and a 1.85% increase compared to the EfficientNet-V2 algorithm. Additionally, when the DFDC and Celeb-DF datasets were used, the EfficientNet-V2+CBAM+Mish algorithm demonstrated the highest accuracy values of 97.66% and 99.17%, respectively. These findings validated the effectiveness of the proposed enhancements (CBAM and Mish) to EfficientNet-V2.

Fig. 3. Performance analysis of the improved EfficientNet-V2.

../../Resources/ieie/IEIESPC.2026.15.2.284/fig3.png

Then, the improved EfficientNet-V2 algorithm was compared with other current detection methods, and the results on different datasets are displayed in Table 3.

Table 3. Comparison with existing detection methods (unit: %).

MesoNet [16] Xception [17] RNN [18] The improved EfficientNet-V2 algorithm
FF++ LQ 68.13 81.73 81.93 83.45
HQ 81.80 91.67 93.80 95.21
DFDC 89.33 94.83 97.33 97.66
Celeb-DF 86.93 94.59 97.13 99.17

In Table 3, the improved EfficientNet-V2 method outperformed several other existing methods for detecting digital video forgery across different datasets. MesoNet exhibited a lower detection accuracy among the compared methods, achieving only 68.13% for the LQ dataset from FF++ and 89.33% for the DFDC dataset. Xception achieved accuracy values above 90% but below 95% for the HQ dataset from FF++ and the DFDC and Celeb-DF datasets. Compared to the RNN algorithm, the improved EfficientNet-V2 algorithm showed a notable improvement, with accuracy enhancements of 1.52% and 1.41% for the LQ and HQ datasets from FF++ and 0.33% and 2.04% for the DFDC and Celeb-DF datasets, respectively. These results validated the effectiveness of the proposed method in detecting digital video forgery.

To assess the generalization capability of the method proposed in this paper, the FF++ dataset was used as the training set. In contrast, the DFDC and Celeb-DF datasets were utilized as test sets to evaluate the accuracy of the different methods in cross-domain experiments. The results are presented in Fig. 4.

Fig. 4. Comparison of accuracy between different methods in cross-domain experiments.

../../Resources/ieie/IEIESPC.2026.15.2.284/fig4.png

Fig. 4 shows that the MesoNet method exhibited low accuracy (53.80% and 57.80%) for the DFDC and Celeb-DF datasets in cross-domain experiments. The Xception method also demonstrated low accuracy, with 63.35% for the DFDC dataset and 69.67% for the Celeb-DF dataset. The RNN algorithm performed poorly in DFDC detection, achieving only 62.75% accuracy, but showed better performance for the Celeb-DF dataset with an accuracy of 74.80%. On the other hand, the proposed method maintained a higher accuracy than the other detection methods in cross-domain experiments, albeit with a slight decrease. It achieved 69.12% accuracy for the DFDC dataset and 88.76% accuracy for the Celeb-DF dataset, demonstrating good generalization ability for detecting digital video forgery.

5. Discussion

Digital video forgery technology plays a specific positive role in education, medicine, and other fields, such as assisting medical diagnosis by synthesizing authentic medical images and realizing virtual teaching by synthesizing digital videos. However, with the advancement of technology, digital video forgery has become simpler and easier to implement. It has been more frequently applied to various illegal scenarios, becoming a tool to realize fraud, defamation, etc.

In April 2021, a criminal gang in Hefei utilized artificial intelligence (AI) technology to manipulate and forge individuals’ faces, facilitating illegal activities such as fraudulent cellphone card registrations within the black and gray market. In May 2023, a victim fell victim to a telecom fraud scheme after being deceived by a forged video, resulting in a loss of 4.3 million yuan within 10 minutes. In April 2024, the Hong Kong police revealed that a victim was swindled out of HK$200 million after unknowingly participating in a video conference that had been impersonated by implementing AI face-swapping technology. These incidents highlight the growing prevalence of unlawful activities exploiting digital video forgery techniques due to the low low threshold. While China has implemented some legal restrictions on the improper utilization of digital video forgery based on the Civil Code, the Criminal Law, and the Personal Information Protection Law, several shortcomings still need to be addressed.

The current laws face difficulties in terms of their scope of application, implementation, and protection. The definition of the right to portrait in the Civil Code is somewhat ambiguous, potentially leading to insufficient protection for rights holders. Illegally obtained source data through digital video forgery techniques often come in large quantities and from unknown sources, making it difficult to clarify ownership. Therefore, achieving effective legal implementation becomes challenging as well. Additionally, the Criminal Law mainly focuses on punishing unlawful acquisition and sale of citizens’ personal information but lacks clear regulations regarding the illegal use of legally obtained personal information.

Furthermore, the regulation of digital video forgery needs to be improved. There needs to be a more precise division of labor and boundaries between the Internet Information Department and the Public Security Department. The penalties for inadequate supervision of online platforms are often lenient and primarily focused on remedial measures after the fact. These measures are insufficient to compensate the victims for their losses. In this regard, the low cost of violating the law makes it difficult to effectively curb the prevalence of digital video forgery.

The regulation of digital video forgery requires a combination of legal and technical approaches. This paper provided some suggestions to address the current legal challenges from the perspective of law.

(1) Improving existing laws and regulations on digital video forgery: The Civil Code should expand the scope of protection for portrait rights to encompass the "forgery using information technology means", ensuring comprehensive protection for individuals’ portrait rights. Regulations on information collectors and processors should be strengthened to safeguard the information subject’s right to know. Criminal law should enhance regulations on information recognition, such as face and voice, and penalize the illegal use of legally obtained information.

(2) Improving the regulatory mode and clarifying the regulatory subject for digital video forgery: The management responsibility of online platforms should be strengthened, and public supervision should be fully utilized to achieve comprehensive regulation. Both positive and malicious use of digital video forgery technology need to be regulated through clear legal provisions, in order to achieve a balance between technological development and protection of legal interests.

In terms of technology, this article proposed an improved EfficientNet-V2 method and validated its detection performance using different datasets. However, there are some limitations, such as the lack of consideration for model lightweight design and the sufficient model generalization. The method only demonstrated excellent performance on the training dataset, while its performance on unknown data remains unclear. Therefore, in future work, it is necessary to strengthen the focus on real-world videos and construct datasets from the real world to further validate models and improve their generalization. It is also important to explore the possibility of optimizing forgery detection methods by combining techniques such as bagging convolutional neural networks and conducting further research. Additionally, studying features such as blinking, speech patterns, and lip movements in forged videos can contribute to achieving effective forgery detection and provide more viable methods for this purpose.

6. Conclusion

The article provides a simple analysis of current digital video forgery from a legal perspective and proposes an improved EfficientNet-V2 method for detecting digital video forgery. Experiments conducted on existing forged digital video datasets showed that the improved EfficientNet-V2 achieved higher accuracy across different datasets and outperformed existing detection methods. It also maintained good detection performance in cross-domain experiments, making it applicable for practical detection of digital video forgery. This method can contribute to regulating the use of digital video forgery techniques and limiting the spread of false information.

References

1 
Negi S. , Jayachandran M. , Upadhyay S. , 2021, Deep fake: An understanding of fake images and videos, International Journal of Scientific Research in Computer Science, Engineering and Information Technology, Vol. 7, No. 3, pp. 183-189DOI
2 
Ivanov N. S. , Arzhskov A. V. , Ivanenko V. G. , 2020, Combining deep learning and super-resolution algorithms for deep fake detection, Proc. of IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering, pp. 326-328DOI
3 
He P. , Li H. , Li B. , Wang H. , Liu L. , 2020, Exposing fake bitrate videos using hybrid deep-learning network from recompression error, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 30, No. 11, pp. 4034-4049DOI
4 
Suratkar S. , Bhiungade S. , Pitale J. , Soni K. , Badgujar T. , Kazi F. , 2023, Deep-fake video detection approaches using convolutional–recurrent neural networks, Journal of Control and Decision, Vol. 10, No. 2, pp. 198-214DOI
5 
Agarwal S. , Farid H. , Fried O. , Agrawala M. , 2020, Detecting deep-fake videos from phoneme-viseme mismatches, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 2814-2822DOI
6 
Pashine S. , Mandiya S. , Gupta P. , Sheikh R. , 2021, Deep fake detection: Survey of facial manipulation detection solutions, International Research Journal of Engineering and Technology, Vol. 8, No. 5, pp. 4441-4449DOI
7 
Hao X. , Li M. , 2021, Deepfake video detection based on 3D convolutional neural networks, Computer Science, Vol. 48, No. 7, pp. 86-92DOI
8 
Wang Y. , Dantcheva A. , 2020, A video is worth more than 1000 lies: Comparing 3DCNN approaches for detecting deepfakes, Proc. of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 515-519DOI
9 
Beaulieu A. , Thullier F. , Bouchard K. , Maître J. , Gaboury S. , 2022, Ultra-wideband data as input of a combined EfficientNet and LSTM architecture for human activity recognition, Journal of Ambient Intelligence and Smart Environments, Vol. 14, No. 3, pp. 157-172DOI
10 
Shi Y. , Ma Z. , Chen H. , Ke Y. , Chen Y. , Zhou X. , 2024, High-resolution recognition of FOAM modes via an improved EfficientNet V2 based convolutional neural network, Frontiers of Physics, Vol. 19, No. 3, pp. 32205DOI
11 
Zhao S. , Nguyen T. H. , Ma B. , 2021, Monaural speech enhancement with complex convolutional block attention module and joint time-frequency losses, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6648-6652DOI
12 
Rössler A. , Cozzolino D. , Verdoliva L. , Riess C. , Thies J. , Niessner M. , 2019, FaceForensics++: Learning to detect manipulated facial images, Proc. of IEEE/CVF International Conference on Computer Vision, pp. 1-11DOI
13 
Dolhansky B. , Howes R. , Pflaum B. , Baram N. , Ferrer C. C. , 2019, The deepfake detection challenge (DFDC) preview dataset, arXiv preprint arXiv:1910.08854DOI
14 
Li Y. , Yang X. , Sun P. , Qi H. , Lyu S. , 2020, Celeb-DF: A large-scale challenging dataset for deepfake forensics, Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3204-3213DOI
15 
Zhang K. , Zhang Z. , Li Z. , Qiao Y. , 2016, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters, Vol. 23, No. 10, pp. 1499-1503DOI
16 
Afchar D. , Nozick V. , Yamagishi J. , Echizen I. , 2018, MesoNet: A compact facial video forgery detection network, Proc. of IEEE International Workshop on Information Forensics and Security, pp. 1-7DOI
17 
Chollet F. , 2017, Xception: Deep learning with depthwise separable convolutions, Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800-1807DOI
18 
Montserrat D. M. , Hao H. , Yarlagadda S. K. , Baireddy S. , Shao R. , Horváth J. , Bartusiak E. , Uang J. , Güera D. , Zhu R. , Delp E. J. , 2020, Deepfakes detection with automatic face weighting, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 2851-2859DOI
Feng Wang
../../Resources/ieie/IEIESPC.2026.15.2.284/au1.png

Feng Wang was born in 1983. He graduated from Chinese People’s Public Security University with a master’s degree in June 2014. He is working at Tibet Police College as an associate professor. He is interested in procedural law.

Yong Zhong Cuo Mu
../../Resources/ieie/IEIESPC.2026.15.2.284/au2.png

Yong Zhong Cuo Mu was born in 1989. She received her bachelor’s degree, graduated from ShangRao Normal Uiversity. Her main research direction is computer techniques. She is working at Tibet Police College as an lecturer.