Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 10, No. 6, p.455-463

ISSN (print) :

2287-5255

Received : 01 July 2021Accepted : 22 July 2021

DOI :

https://doi.org/10.5573/IEIESPC.2021.10.6.455

Regular Paper

Review Paper: This paper reviews the recent progress possibly including previous works in a particular research topic, and has been accepted by the editorial board through the regular reviewing process.

Face Anti-spoofing: A Comparative Review and Prospects

KimWonjun

(Department of Electrical and Electronics Engineering, Konkuk University, Seoul 05029, Korea wonjkim@konkuk.ac.kr )

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Face recognition and verification have been widely used for user authentication on various devices. To protect personal biometric information, face anti-spoofing techniques have been explored based on learning of the properties of various spoofing attacks against real faces. However, the direction of these studies has now encountered an important issue of validation for unseen types of spoofing attacks. Most methods have focused on revealing the differences between fake and real faces in a given dataset, but the performance significantly drops when such methods are tested on unseen samples that are different from those of the dataset used for training models. To cope with this limitation, several researchers have started to generalize the feature space across different datasets (i.e., free to the domain property). The goal of this paper is to provide a comprehensive review of face anti-spoofing methods with a systematic taxonomy, methodologies, and constructive prospects.

Keywords

Biometric information, Face anti-spoofing, Unseen types of spoofing attacks

1. Introduction

The increasing use of face authentication on mobile devices has led to an urgent need for protecting users’ facial information from various spoofing attacks. For example, each facial image can be easily forged using printed photos, 3D masks, tablet players, etc. Thus, malicious log-in attempts via such invalid cues often occur to steal personal information. Therefore, a robust method for face anti-spoofing is highly required to make biometric-based authentication systems more reliable.

To accurately distinguish real faces from fake ones, early works concentrated on finding appropriate feature descriptors to reveal the differences of micro-patterns on the facial surface. Most representatively, local binary patterns (LBPs) have been widely employed to grasp the subtle differences of local textures between real and fake facial images ^[1,^2]. Inspired by the great success of LBP-based face anti-spoofing, many variants have been introduced. For example, some studies have applied LBPs to the chrominance space generated by HSV or YCbCr color conversions ^[3]. Color moments are also combined with LBP descriptors to efficiently emphasize discriminative properties of local textures in different color channels ^[4]. LBP-based face anti-spoofing methods are conceptually simple, are easy to implement, and show promising performance. However, such ``handcrafted'' feature descriptors have limitations in completely covering statistical characteristics of diverse spoofing materials.

Due to the great success of deep learning techniques, several researchers have started to adopt deep neural networks (DNNs) to implicitly learn properties of real and fake faces, even with a wide range of variations. As the first step, the problem of face anti-spoofing is formulated as binary classification, where the role of the taught model is to determine whether a given facial image is fake or not. based on various architectures of convolutional neural networks (CNNs), the performance of face anti-spoofing has been significantly improved ^[5,^6].

To achieve reliable performance in real-world applications, auxiliary information such as depth and remote photoplethysmography (rPPG) signals has been incorporated into traditional CNNs via additional branches ^[7,^8]. Classification-based approaches with CNNs have shown reliable performance for various spoofing attacks, but they are still vulnerable to unseen types of fabrication materials. These approaches only work well for samples from the same dataset that was used for training the models. Types of spoofing attacks in real-world applications cannot be fully covered by a limited number of datasets, so learning generalized spoofing cues has become an essential task. From this point of view, the path of technological evolution now faces the verification of the feasibility of deep learning-based face anti-spoofing techniques.

Most recently, several studies have explored shared features in an embedding space constructed from different datasets to efficiently improve the generalization ability across diverse presentation attacks. For example, zero-shot learning ^[9], domain adaptation ^[10], and generalization ^[11] have been actively studied. There is also another line of research that does not require any fake samples (i.e., one-class learning ^[12]), so it is naturally thought that such methods are easy to apply to real-world applications. However, they still have a long way to go in terms of performance to be deployed.

This paper provides a comprehensive review of face anti-spoofing techniques. We explain diverse methods with a systematic taxonomy, and benchmark datasets for face anti-spoofing are analyzed in detail. This paper could efficiently guide beginners to devise optimal solutions by appropriately giving the pros and cons of previous methods. Finally, the paper discusses the future direction of studies for face anti-spoofing. The main contributions can be summarized as follows:

$\textbf{·}$ Key aspects of face anti-spoofing applicable in real-world scenarios are discussed in depth, including how to formulate the problem of face anti-spoofing, how to categorize previous works, and in what direction the research has been progressing.

$\textbf{·}$ Instead of simply introducing previous works in a timeline, core issues to change the research trends are also presented. This is expected to give great help to develop more reliable face anti-spoofing systems.

$\textbf{·}$ This review also provides a systematic taxonomy of methods for face anti-spoofing with strengths and drawbacks. Through detailed analysis, insights into the future are given with constructive discussions.

The rest of this paper is organized as follows. The next section describes the general process of face anti-spoofing and a systematic taxonomy. Methodologies of each category are explained in detail in Section 3. Some examples of face anti-spoofing on benchmark datasets are demonstrated, and corresponding discussions are given in Section 4. Conclusions follow in Section 5.

2. Face Anti-spoofing: General Process and Systematic Taxonomy

The goal of face anti-spoofing is to determine whether a given image is fake or not, as shown in Fig. 1. Therefore, it is most important to extract discriminative features from real and fake facial images. All the methods for face anti-spoofing have attempted to highlight the difference between features of real and fake facial images to robustly perform in ambiguous cases.

In general, such methods can be broadly categorized into two main groups: handcrafted feature-based and learned feature-based approaches. In the first category, most approaches concentrate on designing image descriptors with special attention to textural patterns. This is done because fake facial images are captured by a camera at least twice, so micro-textures on the surface of fake facial images probably have different patterns from those of real ones. Most previous methods adopted and modified existing feature descriptors, which have been widely used for various tasks in the field of computer vision, such as Fourier spectra ^[13], difference of Gaussians (DoG) ^[14-^16], histograms of oriented gradients (HOG) ^[22], ensemble of visual quality metrics ^[17-^19], local binary patterns (LBPs) [1, 20, 21, 4] and its variants (e.g., local Gabor binary patterns (LGBPs)) ^[23], local speed patterns (LSPs) ^[24], etc.

Handcrafted feature-based models are simple to implement and show robust performance, particularly for spoofing attacks using printed photos and tablets. However, as types of spoofing attacks become more and more diverse, such handcrafted feature-based methods suffer from lack of the representation power. To cope with this limitation, many researchers have adopted DNNs and used features learned from diverse training samples of spoofing attacks. In this category, the problem of face anti-spoofing is formulated as a binary classification task (determining whether a given facial image is fake or not). Inspired by the great success of DNNs in the field of image classification, models in this category directly apply many backbone architectures to extract discriminative features in the embedding space, such as VGG ^[25], ResNet ^[26], etc. The embedding space is constructed based on large amounts of training samples.

To improve the performance of DNN-based approaches, auxiliary information, such as depth, rPPG signals, image quality, etc., is also incorporated into the learning procedure using additional branches [7, 8, 27]. There have also been several attempts to resolve the problem of face anti-spoofing by exploiting the generative cues. For example, spoofing noise can be implicitly estimated in a pixel-wise manner through encoder-decoder architecture, and estimated results are employed for computing the spoofing score ^[28,^29]. Spoofing evidence from different materials is also estimated by bilateral residuals in a multi-level network architecture and used for face anti-spoofing ^[30].

Learned feature-based methods result in significant improvement, even in diverse types of spoofing attacks, but most approaches still suffer from ``unseen'' samples that are not contained in the training dataset. This weakness makes it difficult for DNN-based face anti-spoofing methods to be deployed in real-world scenarios. Most recently, learned feature-based approaches have aimed at improving the generalization ability of the encoding network. To do this, various techniques, have been actively explored, such as domain adaptation ^[10], domain generalization ^[31,^11], and one-class learning [12, 32, 33]. The goal of domain adaptation and generalization is to accurately reveal shared features of real facial images while minimizing the difference between statistical distributions of different datasets. However, spoofing samples from specific domains are inevitably used for contrastive learning, even if the number is small. Thus, these approaches are still expected to suffer from unseen attacks occurring in real-world scenarios.

One-class learning schemes reformulate the problem of face anti-spoofing as outlier detection. That is, they compactly represent an anchor point by only using features of real facial images and define others far from this point as fake samples, regardless of spoofing types. This strategy is expected to work well even with unseen test samples, but the detection accuracy is currently not reliable enough. The categorizations with strengths and drawbacks are shown in Table 1.

Fig. 1. General process of face anti-spoofing.

Table 1. Systematic taxonomy for face anti-spoofing.

Methods	Category	Analysis	Strengths	Drawbacks
[13]	HF	Fourier spectra	Simple to implement Fast operation	Weak to nonlinear distortion Dependency on types of devices
[14], [15], [16]	HF	DoG	Simple to implement Effective to printed attacks	Weak to high-resolution attacks Limitation of bandwidths
[22]	HF	HOG	Simple to implement Robust to noise	Weak to attacks by warped papers Variations by quantization levels
[1], [20], [21], [23], [4], [24]	HF	LBP and its variants	Simple to implement Low computational cost Robust to various types of spoofing	Weak to attacks by warped papers Relatively large memories required
[17], [18], [19]	HF	Image quality	Unified framework for anti-spoofing Easy to add new metrics	Weak to high-resolution attacks Dependency on types of devices
[7], [8], [27]	LF	Depth, rPPG	Robust to attacks by 3D masks High accuracy in the intra test	Complicated process of generating ground truth Weak to unseen attacks in training
[28], [29]	LF	Spoof noise	Robust to locally nonlinear spoofing High accuracy in the intra test	No ground truth for spoof noise Weak to unseen attacks in training
[30]	LF	Material perception	Robust to diverse materials High accuracy in the intra test	Complicated network architecture Slow to converge in training
[9]	LF	Tree structure	Unsupervised nature Easy to extend for new spoofing types	Complicated network architecture Slow to converge in training
[10]	LF	Discrepancy	Easy to transfer features High accuracy in the inter test	High cost for using many datasets
[31], [11]	LF	Adversarial learning	Good generalization ability High accuracy in the inter test	High cost for using many datasets
[12], [33]	LF	SVM, GMM, autoencoder	One-class (i.e., real face) training Robust to unseen samples	Relatively low accuracy
[32]	LF	Feature correlation	One-class (i.e., real face) training Robust to unseen samples	Relatively low accuracy

(HF = handcrafted feature, LF = learned feature, SVM = support vector machine, GMM = Gaussian mixture model)

3. Face Anti-spoofing: Methodologies

In this section, methodologies for each approach are explained in detail based on the systematic taxonomy presented above.

3.1 Handcrafted Feature-based Models

Over the last years, various feature descriptors have been adopted for face anti-spoofing. Basically, since fake facial images are fabricated through a camera, high-frequency components in fake facial images are weaker than in real facial images. Based on this, many algorithms have been based on the spectral domain. For example, Zhang et al. ^[14] used four DoG filters to obtain multiple spectral responses and input filtered images to an SVM classifier. Tan et al. ^[15] combined textural patterns with DoG responses and extended the sparse logistic regression classifier both nonlinearly and spatially to improve its generalization capability for spectral features.

The degree of directional coherence is also a useful clue since the fabrication process weakens the gradient magnitude. Yang et al. ^[22] divided a given facial image into sub-components (eyes, mouth, nose, etc.) and extracted HOG from each region to represent directional properties. The spectral loss from fabrication leads to degradation of the image quality, so there have been meaningful approaches that employ the scores of image quality metrics as feature vectors. Galbally and Marcel ^[17] adopted multiple image quality metrics and aggregate scores to form a feature vector, which was fed into an SVM classifier for training and testing. This strategy is not limited to the specific modality of biometrics, so it can be applied to other authentication systems, such as fingerprint and iris systems ^[18].

Many studies have shown LBPs on the facial surface are effective for revealing the subtle differences between real and fake facial images. Määttä et al. ^[1] simply computed LBP histograms from multi-scale levels and concatenated corresponding outputs as a feature descriptor for a given facial image. Chingovska et al. ^[20] similarly extracted LBP features in both global and local regions and computed the final spoofing score from the ensemble of outputs from global and local classifiers. Another study ^[21] extracted LBP histograms from the spatio-temporal plane to consider textural variations along the timeline. Patel et al. ^[4] combined multi-scale LBP histograms with color moments and generated a single feature vector to allow for color characteristics and textural patterns.

Inspired by the significant improvement by encoding intensity patterns in a small local region, several variants have been devised with other values. For example, responses of multiple Gabor filters are encoded in the same way as LBPs for face anti-spoofing ^[23]. From a similar point of view, face anti-spoofing can be done with local phase quantization (LPQ) ^[34], which represents the textural patterns by quantizing the image spectrum (i.e., local spectral coefficients). The diffusion characteristics are quite different between real and fake facial images, so LSPs were introduced, which showed notable improvement on a mobile device ^[24]. LBP and its variants are easy to implement are fast, but their performance has reached its limit as the spoofing attacks diversify day by day.

3.2 Learned Feature-based Models

Thanks to the great success of DNNs in the field of computer vision, many researchers have begun to allow for learned features to efficiently encode diverse properties of real and fake facial images in the embedding space. Since the problem of face anti-spoofing can be regarded as binary classification, traditional CNNs, which have shown successful performance for image classification, are firstly applied to this task. Li et al. ^[35] directly employed a VGG-face network to extract facial features and conduct a subspace analysis to refine such features for computing an accurate score for face anti-spoofing. To further improve the performance, auxiliary information has been combined with the network. For example, Atoum et al. ^[5] proposed extracting semantic features from facial patches and simultaneously estimating depth values in a pixel-wise manner based on the two-stream network architecture. Liu et al. ^[7] designed an auxiliary supervision scheme by incorporating the rPPG signal and depth map into convolutional and recurrent neural networks. Such depth and rPPG signals have been popular to guide the learning procedure more accurately ^[8,^27].

There have been several attempts to explicitly estimate spoofing noise through the encoder-decoder architecture. Jourabloo et al. ^[28] defined a new degradation model with spoofing noise. Based on this model, they designed a de-spoofing network to implicitly estimate the spoofing noise. Feng et al. ^[29] proposed a two-stage network composed of a spoof cue generator and auxiliary classifier. They did not impose any explicit constraint on spoof cues by spoofing samples for the generator to be generalized well against unseen attacks. Even though learned feature-based approaches bring significant improvement with various DNN architectures, their performance is limited to the dataset used for training, so the performance is hardly guaranteed when unseen samples are given for testing.

Most recently, the research trend has been going toward learning generalized spoofing cues. Li et al. ^[10] tried to teach a mapping function to align a source domain to a target one using the maximum mean discrepancy (MMD) metric in the embedding feature space. Shao et al. ^[31] proposed learning domain-specific feature extractors separately with corresponding discriminators. One generator is simultaneously trained to provide the generalized feature space by adopting an adversarial learning strategy with domain-specific discriminators. Similarly, Jia et al. ^[11] also focused on constructing a generalized feature space by only using real facial images with adversarial loss.

These two methods adopt the triplet loss concept for training to maximize the distance between real and fake features in the embedding space. Such methods show effective generalization ability based on the performance evaluation in the inter-dataset test, such as training the model using dataset A and testing it using dataset B. But they still require fake samples, which are inevitably from a limited number of benchmark datasets. Therefore, it is not thought that learning the generalized spoofing cues fully covers diverse types of spoofing attacks occurring in real-world scenarios.

A few attempts for one-class learning have been made to ultimately accomplish face anti-spoofing regardless of the type of spoofing attack. The heart of this approach is to use only real facial images since it is almost impossible to allow for all types of spoofing attacks. As a pioneer, Xiong et al. ^[12] assumed that the difference between the spoofing input and its reconstruction result of an autoencoder trained by only using real facial images is relatively large compared to when the real facial input is given. To verify this, they used a multi-layer perceptron-based architecture as the autoencoder and showed the potential of the neural network-based one-class learning scheme. Moreover, the concept of outlier detection was also introduced and tested based on a one-class SVM classifier and Gaussian mixture model (GMM) ^[12].

In line with this research direction, Lim et al. ^[32] proposed a novel feature correlation network (FCN) to precisely compute the similarity with features of real facial images, which are learned using deep dual generators. George and Marcel ^[33] adopted multiple modalities (color, infrared, depth, and thermal inputs) and conducted contrastive learning based on a simple CNN with both real and spoofed facial samples. Encoded features are employed to simultaneously learn GMM. In the inference phase, features extracted from the network are compared with each center of Gaussians to determine whether a given input is an outlier (i.e., spoofing sample) or not. Even though one-class learning approaches are most suitable for real-world applications, the accuracy of spoofing detection is inferior to the case of using both real and fake facial samples for training the model.

Fig. 2. Samples from the SiW-M dataset (13 types of spoofing attacks) [9].

Table 2. Summary of published datasets for face anti-spoofing.

Datasets	# of Subjects	# of Videos	Spoof Types	Modality
NUAA	15	12,614 (Images)	Print	RGB
Replay-Attack	50	1,200	Print, Replay	RGB
CASIA-FASD	50	600	Print, 2D Mask (Cut), Replay	RGB
MSU-MFSD	55	440	Print, Replay	RGB
OULU-NPU	55	5,940	Print, Replay	RGB
SiW-M	493	1,630	Print, Replay, 3D Mask, Make-up, 2D Partial Mask	RGB
CASIA-SURF	1,000	21,000	Print, 2D Mask (Cut)	RGB / IR / Depth

4. Face Anti-spoofing: Performance

In this section, we introduce several benchmark datasets, which have been most widely employed for face anti-spoofing and criteria for the performance evaluation. Based on this, some visual results on a mobile device are presented with an example of qualitative evaluation.

4.1 Benchmark Datasets

In order to fairly report the performance of face anti-spoofing, the NUAA dataset ^[15] was firstly constructed. This dataset is composed of 15 subjects who are positioned in front of a web camera with a neutral expression. To make spoofing samples, pictures of subjects are printed using photographic paper and normal A4 paper. The total numbers of real and facial images in the NUAA dataset are 5,105 and 7,509, respectively. The facial region of pictures is cropped by a Viola-Jones detector and normalized to 64x64 pixels based on an eye localizer. Using this NUAA dataset makes much sense as a first step, but the lack of forgery attacks needs to be addressed.

Many other benchmark datasets have been actively constructed for face anti-spoofing. The Replay-Attack dataset ^[20] comprises 1,300 video clips with resolution of 320x240 pixels. These videos are acquired from 50 people under different lighting conditions. This dataset contains three types of spoofing attacks, which are made with printed paper, smartphone screens, and high-resolution tablet screens. In order to consider more real-world environments, video clips are captured in two different ways: fixed-support and hand-held methods.

Similarly, the CASIA-FASD dataset ^[14] is constructed with 50 subjects with paper and screen-based fabrications. The MSU-MFSD dataset ^[19] is built based on more spoof mediums to reflect realistic situations effectively. For example, Note PC, different types of smartphones, tablets, and paper are used for generating real and fake facial images. More recently, the OULU-NPU dataset ^[36] has been introduced and is one of the largest datasets for face anti-spoofing. This dataset consists of 5,940 video clips acquired from 55 subjects with six different smartphones. To check the performance with various viewpoints, face anti-spoofing methods are tested via four unknown presentation attack detection (UPAD) evaluation protocols on the OULU-NPU dataset.

With increasing types of spoofing attacks, studies for face anti-spoofing are starting to require diversified datasets rather than just showing a large number of video clips. Most representatively, the SiW-M dataset ^[9] has been introduced and contains 13 types of spoofing attacks: replay, print, half mask, silicone, transparent, papercraft, mannequin, obfuscation, cosmetic, impersonation, funny eye, paper glasses, and partial paper attacks. Therefore, recent methods mostly employ the SiW-M dataset to show their generalization abilities (i.e., robustness to unseen attacks). Some samples from the SiW-M dataset are shown in Fig. 2.

Multiple modalities are also considered to improve the performance of face anti-spoofing. To this end, the CASIA-SURF dataset ^[37] was constructed with color, infrared, and depth sensors. The multi-modal data were taken from 1,000 subjects, making it a very large-scale dataset. A summary of these published datasets is shown in Table 2.

4.2 Evaluation Metrics

Several quantitative metrics have been used to fairly compare the performance of face anti-spoofing methods based on such benchmark datasets. The area under the receiver operating characteristic curve (AUC) is one of the most widely employed metrics. The half total error rate (HTER) is computed using the average value of the false rejection rate (FRR) and false acceptance rate (FAR) and has also been popular for this task. The equal error rate (EER) is simultaneously computed by finding a point where the FAR value equals the FRR value.

The attack presentation classification error rate (APCER) denotes the proportion of spoofing samples that are misclassified as real ones, while the bona fide presentation classification error rate (BPCER) indicates the proportion of real facial samples incorrectly detected as spoofing attacks ^[38]. To supplement the inverse relationship between APCER and BPCER, the BPCER20 metric also has been used. It indicates the BPCER value when the level of APCER is set to 5\%. Based on these metrics, the performance of face anti-spoofing methods has been fairly verified and compared.

Fig. 3. Some examples of face anti-spoofing [32] (green: real faces, red: fake faces).

4.3 Overall Performance

The aim of this review is to introduce methodologies of face anti-spoofing according to the systematic taxonomy and giving constructive prospects with the future research direction. Therefore, a detailed analysis of the performance of each method and their comparisons will not be dealt with in this paper. Instead, several results of face anti-spoofing are demonstrated with pictorial examples in this subsection. Before checking face anti-spoofing results, the overall procedure in real world scenarios is summarized as follows: the facial region is detected from a captured image and normalized to the specific resolution. To do this, various algorithms can be adopted, such as MTCNN ^[39], TinyFace ^[40], etc. The corresponding result is fed into a classifier (e.g., SVM, DNN, etc.) to determine whether the cropped and normalized region contains a real face or not.

Some pictorial examples of face anti-spoofing are shown in Fig. 3. The first two rows show results for test samples from the CASIA-FASD dataset, while results for the Replay-Attack dataset are shown in the last two rows. These results were generated by the one-class learning method ^[32]. Other approaches also yield the same type of output (a facial region with marked label). As an example of quantitative evaluation, the performance on the OULU-NPU dataset with four UPAD protocols is also shown in Table 3. A motion blur-based face anti-spoofing method ^[41] is additionally evaluated for this test since it shows high accuracy. Based on evaluation metrics mentioned in the previous subsection, all the methods for face anti-spoofing can be reliably verified.

Table 3. Performance on the OULU-NPU dataset.

Prot.	Method	APCER	BPCER
1	Auxiliary [7]	1.6	1.6
	MADDoG [31]	10.6	14.3
	Motion [41]	7.7	10.5
	LGSC [29]	7.7	12.3
	Material [30]	0.0	1.6
2	Auxiliary [7]	2.7	2.7
	MADDoG [31]	4.5	8.4
	Motion [41]	5.4	5.7
	LGSC [29]	2.6	3.5
	Material [30]	2.6	0.8
3	Auxiliary [7]	2.7±1.3	3.1±1.7
	MADDoG [31]	7.4±5.7	10.8±9.8
	Motion [41]	5.1±4.9	10.4±10.1
	LGSC [29]	4.1±4.6	6.5±8.1
	Material [30]	2.8±2.4	2.3±2.8
4	Auxiliary [7]	9.3±5.6	10.4±6.0
	MADDoG [31]	4.0±3.5	9.1±8.0
	Motion [41]	3.9±2.4	7.9±6.7
	LGSC [29]	2.7±1.7	7.9±7.1
	Material [30]	2.9±4.0	7.5±6.9

4.4 Discussion

Even though the accuracy of face anti-spoofing on benchmark datasets is improving quickly, previous methods are rarely applied in practical applications. For example, face verification modules are still used on smartphones without face anti-spoofing, so printed photos can be used for authentication of other smartphones. Therefore, beyond the laboratory environment, the community for face anti-spoofing now needs to prepare solutions for developed methods to be deployed in real-world scenarios. To this end, research is expected go in the following directions:

$\textbf{· Generalization:}$ As explained above, the biggest obstacle to commercialization of face anti-spoofing is vulnerability to unseen attacks. Even though learned feature-based approaches show promising results for intra-testing on various benchmark datasets, the performance of such trained models is still limited to the given domain properties (i.e., they still suffer from unfamiliar distributions frequently occurring in real-world environments. Thus, the accuracy of forgery detection significantly drops. Therefore, it is highly desirable for future studies to focus on generalizing spoof cues in the embedding space. It would also be very helpful to apply the concept of outlier detection to the problem of face anti-spoofing, so learning schemes with only real facial images (i.e., one-class learning) also need to be explored in depth.

$\textbf{· Stability:}$ Most methods for face anti-spoofing often show inconsistent results for image sequences. This instability is also an important factor that hinders the commercialization of face anti-spoofing methods. Several methods have considered temporal affinity in the network architecture ^[42,^43], but a light-weight model still needs to be developed for applications on embedded platforms. Therefore, future work needs to allow for robustness against unexpected artifacts by camera motions in a very efficient way.

$\textbf{· Diversification:}$ From the viewpoint of the dataset, more subjects and types of spoofing attacks are required. Complying with this need, the SiW-M dataset ^[9] has already been constructed and is popular for performance evaluation. Constructing a dataset that is wider (more types of spoofing attacks) and deeper (more subjects) is time-consuming and labor-intensive, but it is essential to consistently build such datasets for DNN-based methods to be more reliable under diverse environments.

In addition to prospects mentioned above, practical studies should not be forgotten. The ultimate goal of face anti-spoofing is to be used with face verification systems in various embedding platforms, especially smartphones. To this end, it is highly required to conduct a detailed analysis for processing time and memory usage of each method. Some examples of face anti-spoofing results on mobile devices are shown in Fig. 4. In summary, many challenging issues are still unresolved, but they will lead to a new generation of more reliable and efficient methods for face anti-spoofing in coming years.

Fig. 4. Test results under real-world environments.

5. Conclusion

In this paper, a comprehensive review for face anti-spoofing was given with a systematic taxonomy. Various methods for face anti-spoofing were categorized into two main groups: handcrafted feature-based and learned feature-based approaches. The strengths and drawbacks of each group were appropriately analyzed in accordance with the research trend of face anti-spoofing. Representative methods in each group were also explained for beginners or experts to have a general understanding for this task. Moreover, benchmark datasets and evaluation metrics were introduced, followed by several experimental results of face anti-spoofing. Based on the detailed analysis, prospects for realization of face anti-spoofing on various embedded platforms were discussed. This review could give practical guides for experts and newcomers to contribute to this topic.

ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of Korea (NRF) grant, which is funded by the Korean government (MSIT) (No. 2020R1F1A1068080).

REFERENCES

Määttä J., Hadid A., Pietikäinen M., Oct. 2011, Face spoofing detection from single images using micro-texture analysis, in Proc. IEEE Int. Joint Conf. Biometrics (IJCB), pp. 1-7

Han H., Klare B. F., Bonnen K., Jain A. K., Jan. 2013, Matching composite sketches to face photos: a component-based approach, IEEE Trans. Inf. Forensics Security, Vol. 8, No. 1, pp. 191-204

Boulkenafet Z., Komulainen J., Hadid A., Aug. 2016, Face spoofing detection using colour texture analysis, IEEE Trans. Inf. Forensics Security, Vol. 11, No. 8, pp. 1818-1830

Patel K., Han H., Jain A. K., Oct. 2016, Secure face unlock: spoof detection on smartphones, IEEE Trans. Inf. Forensics Security, Vol. 11, No. 10, pp. 2268-2283

Atoum Y., Liu Y., Jourabloo A., Liu X., Oct. 2017, Face anti-spoofing using patch and depth-based CNNs, in Proc. IEEE Int. Joint Conf. Biometrics (IJCB), pp. 319-328

Yu Z., Zhao C., Wang Z., Qin Y., Su Z., Li X., Zhou F., Zhao G., Jun. 2020, Searching central difference convolutional networks for face anti-spoofing, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5295-5305

Liu Y., Jourabloo A., Liu X., Jun. 2018, Learning deep models for face anti-spoofing: Binary or auxiliary supervision, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 389-398

Lin B., Li X., Yu Z., Zhao G., May 2019, Face liveness detection by rPPG features and contextual patch-based CNN, in Proc. ICBEA, pp. 61-68

Liu Y., Stehouwer J., Jourabloo A., Liu X., Jun. 2019, Deep tree learning for zero-shot face anti-spoofing, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4675-4684

Li H., Li W., Cao H., Wang S., Huang F., Kot A. C., Jul. 2018, Unsupervised domain adaptation for face anti-spoofing, IEEE Trans. Inf. Forensics Security, Vol. 13, No. 7, pp. 1794-1809

Jia Y., Zhang J., Shan S., Chen X., Jun. 2020, Single-side domain generalization for face anti-spoofing, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8481-8490

Xiong F., AbdAlmageed W., Oct 2018, Unknown presentation attack detection with face RGB images, in Proc. IEEE Int. Conf. Biometrics: Theory Appl. Syst. (BTAS), pp. 1-9

Li J., Wang Y., Tan T., Jain A. K., Aug. 2004, Live face detection based on the analysis of Fourier spectra, in Proc. SPIE, Biometric Technol. Human Identificat, pp. 296-303

Zhang Z., Yan J., Liu S., Lei Z., Yi D., Li S. Z., Mar./Apr. 2012, A face antispoofing database with diverse attacks, in Proc. IAPR Int. Conf. Biometrics (ICB), pp. 26-31

Tan X., Li Y., Liu J., Jiang L., Sep. 2010, Face liveness detection from a single image with sparse low rank bilinear discriminative model, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 504-517

Peixoto B., Michelassi C., Rocha A., Sep. 2011, Face liveness detection under bad illumination conditions, in Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3557-3560

Galbally J., Marcel S., Aug. 2014, Face anti-spoofing based on general image quality assessment, in Proc. IAPR Int. Conf. Pattern Recognit. (ICPR), pp. 1173-1178

Galbally J., Marcel S., Fierrez J., Feb. 2014, Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition, IEEE Trans. Image Process., Vol. 23, No. 2, pp. 710-724

Wen D., Han H., Jain A. K., Apr. 2015, Face spoof detection with image distortion analysis, IEEE Trans. Inf. Forensics Security, Vol. 10, No. 4, pp. 746-761

Chingovska I., Anjos A., Marcel S., Sep. 2012, On the effectiveness of local binary patterns in face anti-spoofing, in Proc. IEEE Int. Conf. Biometrics Special Interest Group (BioSIG), pp. 1-7

de Freitas Pereira T., Anjos A., De Martino J. M., Marcel S., Nov. 2012, LBP-TOP based countermeasure against face spoofing attacks, in Proc. Int. Workshop Comput. Vis. Local Binary Pattern Variants (ACCV), pp. 121-132

Yang J., Lei Z., Liao S., Li S. Z., Jun. 2013, Face liveness detection with component dependent descriptor, in Proc. IEEE Int. Conf. Biometrics (ICB), pp. 1-6

Chingovska I., Anjos A. R. D., Marcel S., Dec. 2014, Biometrics evaluation under spoofing attacks, IEEE Trans. Inf. Forensics Security, Vol. 9, No. 12, pp. 2264-2276

Kim W., Suh S., Han J-J., Aug. 2015, Face liveness detection from a single image via diffusion speed model, IEEE Trans. Image Process., Vol. 24, No. 8, pp. 2456-2465

Simonyan K., Zisserman A., Dec. 2015, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn., Represent., (ICLR), pp. 1-14

He K., Zhang X., Ren S., Sun J., Jun. 2016, Deep residual learning for image recognition, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770-778

Feng L., Po L. M., Li Y., Xu X., Yuan F., Cheung T. C. H., Cheung K.-W., Jul. 2016, Integration of image quality and motion cues for face antispoofing: A neural network approach, J. Vis. Commun. Image Represent., Vol. 38, pp. 451-460

Jourabloo A., Liu Y., Liu X., Sep. 2018, Face de-spoofing: anti-spoofing via noise modeling, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 1-17

Feng H., Hong Z., Yue H., Chen Y., Wang K., Han J., Liu J., Ding E., 2020, Learning generalized spoof cues for face antispoofing, arXiv preprint ar Xiv:2005.03922.

Yu Z., Li X., Niu X., Shi J., Zhao G., Aug. 2020, Face anti-spoofing with human material perception, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 1-19

Shao R., Lan X., Li J., Yuen P. C., Jun. 2019, Multi-adversarial discriminative deep domain generalization for face presentation attack detection, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 10023-10031

Lim S., Gwak Y., Kim W., Roh J-H., Cho S., Dec. 2020, One-class learning method based on live correlation loss for face anti-spoofing, IEEE Access, Vol. 8, pp. 201635-201648

George A., Marcel S., 2021, Learning one class representation for face presentation attack detection using multi-channel convolutional neural networks, IEEE Trans. Inf. Forensics Security, Vol. 16, No. 1, pp. 361-375

Ghiani L., Marcialis G., Roli F., Nov. 2012, Fingerprint liveness detection by local phase quantization, in Proc. IAPR Int. Conf. on Pattern Recognit. (ICPR), pp. 537-540

Li L., Feng X., Boulkenafet Z., Xia Z., Li M., Hadid A., Dec. 2016, An original face anti-spoofing approach using partial convolutional neural network, in Proc. Int. Conf. Image Process. Theory, Tools Appl. (IPTA), pp. 1-6

Boulkenafet Z., Komulainen J., Li L., Feng X., Hadid A., May 2017, OULU-NPU: A mobile face presentation attack database with real-world variations, in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), pp. 612-618

Zhang S., Liu A., Wan J., Liang Y., Guo G., Escalera S., Escalante H. J., Li S. Z., Apr. 2020, CASIA-SURF: a large-scale multi-modal benchmark for face anti-spoofing, IEEE Trans. Bio. Behavior Iden. Sci., Vol. 2, No. 2, pp. 182-193

2016, ,ISO/IEC JTC 1/SC 37 Biometrics. information technology biometric presentation attack detection part 1: Framework., international organization for standardization

Zhang K., Zhang Z., Li Z., Qiao Y., Oct. 2016, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Process. Lett., Vol. 23, No. 10, pp. 1499-1503

Hu P., Ramanan D., Jun 2017, Finding tiny faces, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1522-1530

Li L., Xia Z., Hadid A., Jiang X., Zhang H., Feng X., Sep. 2019, Replayed video attack detection based on motion blur analysis, IEEE Trans. Inf. Forensics Security, Vol. 14, No. 9, pp. 2246-2261

Wang Z., Zhao C., Qin Y., Zhou Q., Qi G., Wan J., Lei Z., 2018, Exploiting temporal and depth information for multi-frame face anti-spoofing, arXiv preprint arXiv:1811.05118

Yang X., Luo W., Bao L., Gao Y., Gong D., Zheng S., Li Z., Lei W., Jun. 2019, Face anti-spoofing: model matters, so does data, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3507-3516

Author

Wonjun Kim

Wonjun Kim received a B.S. degree from the Department of Electronic Engineering, Sogang University, Seoul, South Korea, in 2006, an M.S. degree from the Department of Information and Communications, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2008, and a Ph.D. degree from the Department of Electrical Engineering, KAIST, in 2012. From September 2012 to February 2016, he was a Research Staff Member of the Samsung Advanced Institute of Technology (SAIT), South Korea. Since March 2016, he has been with the Department of Electrical and Electronics Engineering, Konkuk University, Seoul, where he is currently an associate professor. His research interests include image and video understanding, computer vision, pattern recognition, and biometrics, with an emphasis on background subtraction, saliency detection, face, and action recognition. He has served as a regular reviewer for over 30 international journal articles, including the IEEE Transactions on Image Processing, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, IEEE Transactions on Cybernetics, IEEE Access, IEEE Signal Processing, Letters, and so on.