KimWonjun
-
(Department of Electrical and Electronics Engineering, Konkuk University, Seoul 05029,
Korea wonjkim@konkuk.ac.kr )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Biometric information, Face anti-spoofing, Unseen types of spoofing attacks
1. Introduction
The increasing use of face authentication on mobile devices has led to an urgent need
for protecting users’ facial information from various spoofing attacks. For example,
each facial image can be easily forged using printed photos, 3D masks, tablet players,
etc. Thus, malicious log-in attempts via such invalid cues often occur to steal personal
information. Therefore, a robust method for face anti-spoofing is highly required
to make biometric-based authentication systems more reliable.
To accurately distinguish real faces from fake ones, early works concentrated on finding
appropriate feature descriptors to reveal the differences of micro-patterns on the
facial surface. Most representatively, local binary patterns (LBPs) have been widely
employed to grasp the subtle differences of local textures between real and fake facial
images [1,2]. Inspired by the great success of LBP-based face anti-spoofing, many variants have
been introduced. For example, some studies have applied LBPs to the chrominance space
generated by HSV or YCbCr color conversions [3]. Color moments are also combined with LBP descriptors to efficiently emphasize discriminative
properties of local textures in different color channels [4]. LBP-based face anti-spoofing methods are conceptually simple, are easy to implement,
and show promising performance. However, such ``handcrafted'' feature descriptors
have limitations in completely covering statistical characteristics of diverse spoofing
materials.
Due to the great success of deep learning techniques, several researchers have started
to adopt deep neural networks (DNNs) to implicitly learn properties of real and fake
faces, even with a wide range of variations. As the first step, the problem of face
anti-spoofing is formulated as binary classification, where the role of the taught
model is to determine whether a given facial image is fake or not. based on various
architectures of convolutional neural networks (CNNs), the performance of face anti-spoofing
has been significantly improved [5,6].
To achieve reliable performance in real-world applications, auxiliary information
such as depth and remote photoplethysmography (rPPG) signals has been incorporated
into traditional CNNs via additional branches [7,8]. Classification-based approaches with CNNs have shown reliable performance for various
spoofing attacks, but they are still vulnerable to unseen types of fabrication materials.
These approaches only work well for samples from the same dataset that was used for
training the models. Types of spoofing attacks in real-world applications cannot be
fully covered by a limited number of datasets, so learning generalized spoofing cues
has become an essential task. From this point of view, the path of technological evolution
now faces the verification of the feasibility of deep learning-based face anti-spoofing
techniques.
Most recently, several studies have explored shared features in an embedding space
constructed from different datasets to efficiently improve the generalization ability
across diverse presentation attacks. For example, zero-shot learning [9], domain adaptation [10], and generalization [11] have been actively studied. There is also another line of research that does not
require any fake samples (i.e., one-class learning [12]), so it is naturally thought that such methods are easy to apply to real-world applications.
However, they still have a long way to go in terms of performance to be deployed.
This paper provides a comprehensive review of face anti-spoofing techniques. We explain
diverse methods with a systematic taxonomy, and benchmark datasets for face anti-spoofing
are analyzed in detail. This paper could efficiently guide beginners to devise optimal
solutions by appropriately giving the pros and cons of previous methods. Finally,
the paper discusses the future direction of studies for face anti-spoofing. The main
contributions can be summarized as follows:
$\textbf{·}$ Key aspects of face anti-spoofing applicable in real-world scenarios
are discussed in depth, including how to formulate the problem of face anti-spoofing,
how to categorize previous works, and in what direction the research has been progressing.
$\textbf{·}$ Instead of simply introducing previous works in a timeline, core issues
to change the research trends are also presented. This is expected to give great help
to develop more reliable face anti-spoofing systems.
$\textbf{·}$ This review also provides a systematic taxonomy of methods for face anti-spoofing
with strengths and drawbacks. Through detailed analysis, insights into the future
are given with constructive discussions.
The rest of this paper is organized as follows. The next section describes the general
process of face anti-spoofing and a systematic taxonomy. Methodologies of each category
are explained in detail in Section 3. Some examples of face anti-spoofing on benchmark
datasets are demonstrated, and corresponding discussions are given in Section 4. Conclusions
follow in Section 5.
2. Face Anti-spoofing: General Process and Systematic Taxonomy
The goal of face anti-spoofing is to determine whether a given image is fake or not,
as shown in Fig. 1. Therefore, it is most important to extract discriminative features from real and
fake facial images. All the methods for face anti-spoofing have attempted to highlight
the difference between features of real and fake facial images to robustly perform
in ambiguous cases.
In general, such methods can be broadly categorized into two main groups: handcrafted
feature-based and learned feature-based approaches. In the first category, most approaches
concentrate on designing image descriptors with special attention to textural patterns.
This is done because fake facial images are captured by a camera at least twice, so
micro-textures on the surface of fake facial images probably have different patterns
from those of real ones. Most previous methods adopted and modified existing feature
descriptors, which have been widely used for various tasks in the field of computer
vision, such as Fourier spectra [13], difference of Gaussians (DoG) [14-16], histograms of oriented gradients (HOG) [22], ensemble of visual quality metrics [17-19], local binary patterns (LBPs) [1, 20, 21, 4] and its variants (e.g., local Gabor
binary patterns (LGBPs)) [23], local speed patterns (LSPs) [24], etc.
Handcrafted feature-based models are simple to implement and show robust performance,
particularly for spoofing attacks using printed photos and tablets. However, as types
of spoofing attacks become more and more diverse, such handcrafted feature-based methods
suffer from lack of the representation power. To cope with this limitation, many researchers
have adopted DNNs and used features learned from diverse training samples of spoofing
attacks. In this category, the problem of face anti-spoofing is formulated as a binary
classification task (determining whether a given facial image is fake or not). Inspired
by the great success of DNNs in the field of image classification, models in this
category directly apply many backbone architectures to extract discriminative features
in the embedding space, such as VGG [25], ResNet [26], etc. The embedding space is constructed based on large amounts of training samples.
To improve the performance of DNN-based approaches, auxiliary information, such as
depth, rPPG signals, image quality, etc., is also incorporated into the learning procedure
using additional branches [7, 8, 27]. There have also been several attempts to resolve
the problem of face anti-spoofing by exploiting the generative cues. For example,
spoofing noise can be implicitly estimated in a pixel-wise manner through encoder-decoder
architecture, and estimated results are employed for computing the spoofing score
[28,29]. Spoofing evidence from different materials is also estimated by bilateral residuals
in a multi-level network architecture and used for face anti-spoofing [30].
Learned feature-based methods result in significant improvement, even in diverse types
of spoofing attacks, but most approaches still suffer from ``unseen'' samples that
are not contained in the training dataset. This weakness makes it difficult for DNN-based
face anti-spoofing methods to be deployed in real-world scenarios. Most recently,
learned feature-based approaches have aimed at improving the generalization ability
of the encoding network. To do this, various techniques, have been actively explored,
such as domain adaptation [10], domain generalization [31,11], and one-class learning [12, 32, 33]. The goal of domain adaptation and generalization
is to accurately reveal shared features of real facial images while minimizing the
difference between statistical distributions of different datasets. However, spoofing
samples from specific domains are inevitably used for contrastive learning, even if
the number is small. Thus, these approaches are still expected to suffer from unseen
attacks occurring in real-world scenarios.
One-class learning schemes reformulate the problem of face anti-spoofing as outlier
detection. That is, they compactly represent an anchor point by only using features
of real facial images and define others far from this point as fake samples, regardless
of spoofing types. This strategy is expected to work well even with unseen test samples,
but the detection accuracy is currently not reliable enough. The categorizations with
strengths and drawbacks are shown in Table 1.
Fig. 1. General process of face anti-spoofing.
Table 1. Systematic taxonomy for face anti-spoofing.
Methods
|
Category
|
Analysis
|
Strengths
|
Drawbacks
|
[13]
|
HF
|
Fourier spectra
|
Simple to implement
Fast operation
|
Weak to nonlinear distortion
Dependency on types of devices
|
[14], [15], [16]
|
HF
|
DoG
|
Simple to implement
Effective to printed attacks
|
Weak to high-resolution attacks
Limitation of bandwidths
|
[22]
|
HF
|
HOG
|
Simple to implement
Robust to noise
|
Weak to attacks by warped papers
Variations by quantization levels
|
[1], [20], [21], [23], [4], [24]
|
HF
|
LBP and its variants
|
Simple to implement
Low computational cost
Robust to various types of spoofing
|
Weak to attacks by warped papers
Relatively large memories required
|
[17], [18], [19]
|
HF
|
Image quality
|
Unified framework for anti-spoofing
Easy to add new metrics
|
Weak to high-resolution attacks
Dependency on types of devices
|
[7], [8], [27]
|
LF
|
Depth, rPPG
|
Robust to attacks by 3D masks
High accuracy in the intra test
|
Complicated process of generating ground truth
Weak to unseen attacks in training
|
[28], [29]
|
LF
|
Spoof noise
|
Robust to locally nonlinear spoofing
High accuracy in the intra test
|
No ground truth for spoof noise
Weak to unseen attacks in training
|
[30]
|
LF
|
Material perception
|
Robust to diverse materials
High accuracy in the intra test
|
Complicated network architecture
Slow to converge in training
|
[9]
|
LF
|
Tree structure
|
Unsupervised nature
Easy to extend for new spoofing types
|
Complicated network architecture
Slow to converge in training
|
[10]
|
LF
|
Discrepancy
|
Easy to transfer features
High accuracy in the inter test
|
High cost for using many datasets
|
[31], [11]
|
LF
|
Adversarial learning
|
Good generalization ability
High accuracy in the inter test
|
High cost for using many datasets
|
[12], [33]
|
LF
|
SVM, GMM, autoencoder
|
One-class (i.e., real face) training
Robust to unseen samples
|
Relatively low accuracy
|
[32]
|
LF
|
Feature correlation
|
One-class (i.e., real face) training
Robust to unseen samples
|
Relatively low accuracy
|
(HF = handcrafted feature, LF = learned feature, SVM = support vector machine, GMM
= Gaussian mixture model)
3. Face Anti-spoofing: Methodologies
In this section, methodologies for each approach are explained in detail based on
the systematic taxonomy presented above.
3.1 Handcrafted Feature-based Models
Over the last years, various feature descriptors have been adopted for face anti-spoofing.
Basically, since fake facial images are fabricated through a camera, high-frequency
components in fake facial images are weaker than in real facial images. Based on this,
many algorithms have been based on the spectral domain. For example, Zhang et al.
[14] used four DoG filters to obtain multiple spectral responses and input filtered images
to an SVM classifier. Tan et al. [15] combined textural patterns with DoG responses and extended the sparse logistic regression
classifier both nonlinearly and spatially to improve its generalization capability
for spectral features.
The degree of directional coherence is also a useful clue since the fabrication process
weakens the gradient magnitude. Yang et al. [22] divided a given facial image into sub-components (eyes, mouth, nose, etc.) and extracted
HOG from each region to represent directional properties. The spectral loss from fabrication
leads to degradation of the image quality, so there have been meaningful approaches
that employ the scores of image quality metrics as feature vectors. Galbally and Marcel
[17] adopted multiple image quality metrics and aggregate scores to form a feature vector,
which was fed into an SVM classifier for training and testing. This strategy is not
limited to the specific modality of biometrics, so it can be applied to other authentication
systems, such as fingerprint and iris systems [18].
Many studies have shown LBPs on the facial surface are effective for revealing the
subtle differences between real and fake facial images. Määttä et al. [1] simply computed LBP histograms from multi-scale levels and concatenated corresponding
outputs as a feature descriptor for a given facial image. Chingovska et al. [20] similarly extracted LBP features in both global and local regions and computed the
final spoofing score from the ensemble of outputs from global and local classifiers.
Another study [21] extracted LBP histograms from the spatio-temporal plane to consider textural variations
along the timeline. Patel et al. [4] combined multi-scale LBP histograms with color moments and generated a single feature
vector to allow for color characteristics and textural patterns.
Inspired by the significant improvement by encoding intensity patterns in a small
local region, several variants have been devised with other values. For example, responses
of multiple Gabor filters are encoded in the same way as LBPs for face anti-spoofing
[23]. From a similar point of view, face anti-spoofing can be done with local phase quantization
(LPQ) [34], which represents the textural patterns by quantizing the image spectrum (i.e., local
spectral coefficients). The diffusion characteristics are quite different between
real and fake facial images, so LSPs were introduced, which showed notable improvement
on a mobile device [24]. LBP and its variants are easy to implement are fast, but their performance has reached
its limit as the spoofing attacks diversify day by day.
3.2 Learned Feature-based Models
Thanks to the great success of DNNs in the field of computer vision, many researchers
have begun to allow for learned features to efficiently encode diverse properties
of real and fake facial images in the embedding space. Since the problem of face anti-spoofing
can be regarded as binary classification, traditional CNNs, which have shown successful
performance for image classification, are firstly applied to this task. Li et al.
[35] directly employed a VGG-face network to extract facial features and conduct a subspace
analysis to refine such features for computing an accurate score for face anti-spoofing.
To further improve the performance, auxiliary information has been combined with the
network. For example, Atoum et al. [5] proposed extracting semantic features from facial patches and simultaneously estimating
depth values in a pixel-wise manner based on the two-stream network architecture.
Liu et al. [7] designed an auxiliary supervision scheme by incorporating the rPPG signal and depth
map into convolutional and recurrent neural networks. Such depth and rPPG signals
have been popular to guide the learning procedure more accurately [8,27].
There have been several attempts to explicitly estimate spoofing noise through the
encoder-decoder architecture. Jourabloo et al. [28] defined a new degradation model with spoofing noise. Based on this model, they designed
a de-spoofing network to implicitly estimate the spoofing noise. Feng et al. [29] proposed a two-stage network composed of a spoof cue generator and auxiliary classifier.
They did not impose any explicit constraint on spoof cues by spoofing samples for
the generator to be generalized well against unseen attacks. Even though learned feature-based
approaches bring significant improvement with various DNN architectures, their performance
is limited to the dataset used for training, so the performance is hardly guaranteed
when unseen samples are given for testing.
Most recently, the research trend has been going toward learning generalized spoofing
cues. Li et al. [10] tried to teach a mapping function to align a source domain to a target one using
the maximum mean discrepancy (MMD) metric in the embedding feature space. Shao et
al. [31] proposed learning domain-specific feature extractors separately with corresponding
discriminators. One generator is simultaneously trained to provide the generalized
feature space by adopting an adversarial learning strategy with domain-specific discriminators.
Similarly, Jia et al. [11] also focused on constructing a generalized feature space by only using real facial
images with adversarial loss.
These two methods adopt the triplet loss concept for training to maximize the distance
between real and fake features in the embedding space. Such methods show effective
generalization ability based on the performance evaluation in the inter-dataset test,
such as training the model using dataset A and testing it using dataset B. But they
still require fake samples, which are inevitably from a limited number of benchmark
datasets. Therefore, it is not thought that learning the generalized spoofing cues
fully covers diverse types of spoofing attacks occurring in real-world scenarios.
A few attempts for one-class learning have been made to ultimately accomplish face
anti-spoofing regardless of the type of spoofing attack. The heart of this approach
is to use only real facial images since it is almost impossible to allow for all types
of spoofing attacks. As a pioneer, Xiong et al. [12] assumed that the difference between the spoofing input and its reconstruction result
of an autoencoder trained by only using real facial images is relatively large compared
to when the real facial input is given. To verify this, they used a multi-layer perceptron-based
architecture as the autoencoder and showed the potential of the neural network-based
one-class learning scheme. Moreover, the concept of outlier detection was also introduced
and tested based on a one-class SVM classifier and Gaussian mixture model (GMM) [12].
In line with this research direction, Lim et al. [32] proposed a novel feature correlation network (FCN) to precisely compute the similarity
with features of real facial images, which are learned using deep dual generators.
George and Marcel [33] adopted multiple modalities (color, infrared, depth, and thermal inputs) and conducted
contrastive learning based on a simple CNN with both real and spoofed facial samples.
Encoded features are employed to simultaneously learn GMM. In the inference phase,
features extracted from the network are compared with each center of Gaussians to
determine whether a given input is an outlier (i.e., spoofing sample) or not. Even
though one-class learning approaches are most suitable for real-world applications,
the accuracy of spoofing detection is inferior to the case of using both real and
fake facial samples for training the model.
Fig. 2. Samples from the SiW-M dataset (13 types of spoofing attacks) [9].
Table 2. Summary of published datasets for face anti-spoofing.
Datasets
|
# of Subjects
|
# of Videos
|
Spoof Types
|
Modality
|
NUAA
|
15
|
12,614 (Images)
|
Print
|
RGB
|
Replay-Attack
|
50
|
1,200
|
Print, Replay
|
RGB
|
CASIA-FASD
|
50
|
600
|
Print, 2D Mask (Cut), Replay
|
RGB
|
MSU-MFSD
|
55
|
440
|
Print, Replay
|
RGB
|
OULU-NPU
|
55
|
5,940
|
Print, Replay
|
RGB
|
SiW-M
|
493
|
1,630
|
Print, Replay, 3D Mask, Make-up,
2D Partial Mask
|
RGB
|
CASIA-SURF
|
1,000
|
21,000
|
Print, 2D Mask (Cut)
|
RGB / IR / Depth
|
4. Face Anti-spoofing: Performance
In this section, we introduce several benchmark datasets, which have been most widely
employed for face anti-spoofing and criteria for the performance evaluation. Based
on this, some visual results on a mobile device are presented with an example of qualitative
evaluation.
4.1 Benchmark Datasets
In order to fairly report the performance of face anti-spoofing, the NUAA dataset
[15] was firstly constructed. This dataset is composed of 15 subjects who are positioned
in front of a web camera with a neutral expression. To make spoofing samples, pictures
of subjects are printed using photographic paper and normal A4 paper. The total numbers
of real and facial images in the NUAA dataset are 5,105 and 7,509, respectively. The
facial region of pictures is cropped by a Viola-Jones detector and normalized to 64x64
pixels based on an eye localizer. Using this NUAA dataset makes much sense as a first
step, but the lack of forgery attacks needs to be addressed.
Many other benchmark datasets have been actively constructed for face anti-spoofing.
The Replay-Attack dataset [20] comprises 1,300 video clips with resolution of 320x240 pixels. These videos are acquired
from 50 people under different lighting conditions. This dataset contains three types
of spoofing attacks, which are made with printed paper, smartphone screens, and high-resolution
tablet screens. In order to consider more real-world environments, video clips are
captured in two different ways: fixed-support and hand-held methods.
Similarly, the CASIA-FASD dataset [14] is constructed with 50 subjects with paper and screen-based fabrications. The MSU-MFSD
dataset [19] is built based on more spoof mediums to reflect realistic situations effectively.
For example, Note PC, different types of smartphones, tablets, and paper are used
for generating real and fake facial images. More recently, the OULU-NPU dataset [36] has been introduced and is one of the largest datasets for face anti-spoofing. This
dataset consists of 5,940 video clips acquired from 55 subjects with six different
smartphones. To check the performance with various viewpoints, face anti-spoofing
methods are tested via four unknown presentation attack detection (UPAD) evaluation
protocols on the OULU-NPU dataset.
With increasing types of spoofing attacks, studies for face anti-spoofing are starting
to require diversified datasets rather than just showing a large number of video clips.
Most representatively, the SiW-M dataset [9] has been introduced and contains 13 types of spoofing attacks: replay, print, half
mask, silicone, transparent, papercraft, mannequin, obfuscation, cosmetic, impersonation,
funny eye, paper glasses, and partial paper attacks. Therefore, recent methods mostly
employ the SiW-M dataset to show their generalization abilities (i.e., robustness
to unseen attacks). Some samples from the SiW-M dataset are shown in Fig. 2.
Multiple modalities are also considered to improve the performance of face anti-spoofing.
To this end, the CASIA-SURF dataset [37] was constructed with color, infrared, and depth sensors. The multi-modal data were
taken from 1,000 subjects, making it a very large-scale dataset. A summary of these
published datasets is shown in Table 2.
4.2 Evaluation Metrics
Several quantitative metrics have been used to fairly compare the performance of face
anti-spoofing methods based on such benchmark datasets. The area under the receiver
operating characteristic curve (AUC) is one of the most widely employed metrics. The
half total error rate (HTER) is computed using the average value of the false rejection
rate (FRR) and false acceptance rate (FAR) and has also been popular for this task.
The equal error rate (EER) is simultaneously computed by finding a point where the
FAR value equals the FRR value.
The attack presentation classification error rate (APCER) denotes the proportion of
spoofing samples that are misclassified as real ones, while the bona fide presentation
classification error rate (BPCER) indicates the proportion of real facial samples
incorrectly detected as spoofing attacks [38]. To supplement the inverse relationship between APCER and BPCER, the BPCER20 metric
also has been used. It indicates the BPCER value when the level of APCER is set to
5\%. Based on these metrics, the performance of face anti-spoofing methods has been
fairly verified and compared.
Fig. 3. Some examples of face anti-spoofing [32] (green: real faces, red: fake faces).
4.3 Overall Performance
The aim of this review is to introduce methodologies of face anti-spoofing according
to the systematic taxonomy and giving constructive prospects with the future research
direction. Therefore, a detailed analysis of the performance of each method and their
comparisons will not be dealt with in this paper. Instead, several results of face
anti-spoofing are demonstrated with pictorial examples in this subsection. Before
checking face anti-spoofing results, the overall procedure in real world scenarios
is summarized as follows: the facial region is detected from a captured image and
normalized to the specific resolution. To do this, various algorithms can be adopted,
such as MTCNN [39], TinyFace [40], etc. The corresponding result is fed into a classifier (e.g., SVM, DNN, etc.) to
determine whether the cropped and normalized region contains a real face or not.
Some pictorial examples of face anti-spoofing are shown in Fig. 3. The first two rows show results for test samples from the CASIA-FASD dataset, while
results for the Replay-Attack dataset are shown in the last two rows. These results
were generated by the one-class learning method [32]. Other approaches also yield the same type of output (a facial region with marked
label). As an example of quantitative evaluation, the performance on the OULU-NPU
dataset with four UPAD protocols is also shown in Table 3. A motion blur-based face anti-spoofing method [41] is additionally evaluated for this test since it shows high accuracy. Based on evaluation
metrics mentioned in the previous subsection, all the methods for face anti-spoofing
can be reliably verified.
Table 3. Performance on the OULU-NPU dataset.
Prot.
|
Method
|
APCER
|
BPCER
|
1
|
Auxiliary [7]
|
1.6
|
1.6
|
MADDoG [31]
|
10.6
|
14.3
|
Motion [41]
|
7.7
|
10.5
|
LGSC [29]
|
7.7
|
12.3
|
Material [30]
|
0.0
|
1.6
|
2
|
Auxiliary [7]
|
2.7
|
2.7
|
MADDoG [31]
|
4.5
|
8.4
|
Motion [41]
|
5.4
|
5.7
|
LGSC [29]
|
2.6
|
3.5
|
Material [30]
|
2.6
|
0.8
|
3
|
Auxiliary [7]
|
2.7±1.3
|
3.1±1.7
|
MADDoG [31]
|
7.4±5.7
|
10.8±9.8
|
Motion [41]
|
5.1±4.9
|
10.4±10.1
|
LGSC [29]
|
4.1±4.6
|
6.5±8.1
|
Material [30]
|
2.8±2.4
|
2.3±2.8
|
4
|
Auxiliary [7]
|
9.3±5.6
|
10.4±6.0
|
MADDoG [31]
|
4.0±3.5
|
9.1±8.0
|
Motion [41]
|
3.9±2.4
|
7.9±6.7
|
LGSC [29]
|
2.7±1.7
|
7.9±7.1
|
Material [30]
|
2.9±4.0
|
7.5±6.9
|
4.4 Discussion
Even though the accuracy of face anti-spoofing on benchmark datasets is improving
quickly, previous methods are rarely applied in practical applications. For example,
face verification modules are still used on smartphones without face anti-spoofing,
so printed photos can be used for authentication of other smartphones. Therefore,
beyond the laboratory environment, the community for face anti-spoofing now needs
to prepare solutions for developed methods to be deployed in real-world scenarios.
To this end, research is expected go in the following directions:
$\textbf{· Generalization:}$ As explained above, the biggest obstacle to commercialization
of face anti-spoofing is vulnerability to unseen attacks. Even though learned feature-based
approaches show promising results for intra-testing on various benchmark datasets,
the performance of such trained models is still limited to the given domain properties
(i.e., they still suffer from unfamiliar distributions frequently occurring in real-world
environments. Thus, the accuracy of forgery detection significantly drops. Therefore,
it is highly desirable for future studies to focus on generalizing spoof cues in the
embedding space. It would also be very helpful to apply the concept of outlier detection
to the problem of face anti-spoofing, so learning schemes with only real facial images
(i.e., one-class learning) also need to be explored in depth.
$\textbf{· Stability:}$ Most methods for face anti-spoofing often show inconsistent
results for image sequences. This instability is also an important factor that hinders
the commercialization of face anti-spoofing methods. Several methods have considered
temporal affinity in the network architecture [42,43], but a light-weight model still needs to be developed for applications on embedded
platforms. Therefore, future work needs to allow for robustness against unexpected
artifacts by camera motions in a very efficient way.
$\textbf{· Diversification:}$ From the viewpoint of the dataset, more subjects and
types of spoofing attacks are required. Complying with this need, the SiW-M dataset
[9] has already been constructed and is popular for performance evaluation. Constructing
a dataset that is wider (more types of spoofing attacks) and deeper (more subjects)
is time-consuming and labor-intensive, but it is essential to consistently build such
datasets for DNN-based methods to be more reliable under diverse environments.
In addition to prospects mentioned above, practical studies should not be forgotten.
The ultimate goal of face anti-spoofing is to be used with face verification systems
in various embedding platforms, especially smartphones. To this end, it is highly
required to conduct a detailed analysis for processing time and memory usage of each
method. Some examples of face anti-spoofing results on mobile devices are shown in
Fig. 4. In summary, many challenging issues are still unresolved, but they will lead to
a new generation of more reliable and efficient methods for face anti-spoofing in
coming years.
Fig. 4. Test results under real-world environments.
5. Conclusion
In this paper, a comprehensive review for face anti-spoofing was given with a systematic
taxonomy. Various methods for face anti-spoofing were categorized into two main groups:
handcrafted feature-based and learned feature-based approaches. The strengths and
drawbacks of each group were appropriately analyzed in accordance with the research
trend of face anti-spoofing. Representative methods in each group were also explained
for beginners or experts to have a general understanding for this task. Moreover,
benchmark datasets and evaluation metrics were introduced, followed by several experimental
results of face anti-spoofing. Based on the detailed analysis, prospects for realization
of face anti-spoofing on various embedded platforms were discussed. This review could
give practical guides for experts and newcomers to contribute to this topic.
ACKNOWLEDGMENTS
This work was supported by the National Research Foundation of Korea (NRF) grant,
which is funded by the Korean government (MSIT) (No. 2020R1F1A1068080).
REFERENCES
Määttä J., Hadid A., Pietikäinen M., Oct. 2011, Face spoofing detection from single
images using micro-texture analysis, in Proc. IEEE Int. Joint Conf. Biometrics (IJCB),
pp. 1-7
Han H., Klare B. F., Bonnen K., Jain A. K., Jan. 2013, Matching composite sketches
to face photos: a component-based approach, IEEE Trans. Inf. Forensics Security, Vol.
8, No. 1, pp. 191-204
Boulkenafet Z., Komulainen J., Hadid A., Aug. 2016, Face spoofing detection using
colour texture analysis, IEEE Trans. Inf. Forensics Security, Vol. 11, No. 8, pp.
1818-1830
Patel K., Han H., Jain A. K., Oct. 2016, Secure face unlock: spoof detection on smartphones,
IEEE Trans. Inf. Forensics Security, Vol. 11, No. 10, pp. 2268-2283
Atoum Y., Liu Y., Jourabloo A., Liu X., Oct. 2017, Face anti-spoofing using patch
and depth-based CNNs, in Proc. IEEE Int. Joint Conf. Biometrics (IJCB), pp. 319-328
Yu Z., Zhao C., Wang Z., Qin Y., Su Z., Li X., Zhou F., Zhao G., Jun. 2020, Searching
central difference convolutional networks for face anti-spoofing, in Proc. IEEE Int.
Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5295-5305
Liu Y., Jourabloo A., Liu X., Jun. 2018, Learning deep models for face anti-spoofing:
Binary or auxiliary supervision, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.
(CVPR), pp. 389-398
Lin B., Li X., Yu Z., Zhao G., May 2019, Face liveness detection by rPPG features
and contextual patch-based CNN, in Proc. ICBEA, pp. 61-68
Liu Y., Stehouwer J., Jourabloo A., Liu X., Jun. 2019, Deep tree learning for zero-shot
face anti-spoofing, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 4675-4684
Li H., Li W., Cao H., Wang S., Huang F., Kot A. C., Jul. 2018, Unsupervised domain
adaptation for face anti-spoofing, IEEE Trans. Inf. Forensics Security, Vol. 13, No.
7, pp. 1794-1809
Jia Y., Zhang J., Shan S., Chen X., Jun. 2020, Single-side domain generalization for
face anti-spoofing, in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR),
pp. 8481-8490
Xiong F., AbdAlmageed W., Oct 2018, Unknown presentation attack detection with face
RGB images, in Proc. IEEE Int. Conf. Biometrics: Theory Appl. Syst. (BTAS), pp. 1-9
Li J., Wang Y., Tan T., Jain A. K., Aug. 2004, Live face detection based on the analysis
of Fourier spectra, in Proc. SPIE, Biometric Technol. Human Identificat, pp. 296-303
Zhang Z., Yan J., Liu S., Lei Z., Yi D., Li S. Z., Mar./Apr. 2012, A face antispoofing
database with diverse attacks, in Proc. IAPR Int. Conf. Biometrics (ICB), pp. 26-31
Tan X., Li Y., Liu J., Jiang L., Sep. 2010, Face liveness detection from a single
image with sparse low rank bilinear discriminative model, in Proc. Eur. Conf. Comput.
Vis. (ECCV), pp. 504-517
Peixoto B., Michelassi C., Rocha A., Sep. 2011, Face liveness detection under bad
illumination conditions, in Proc. IEEE Int. Conf. Image Process. (ICIP), pp. 3557-3560
Galbally J., Marcel S., Aug. 2014, Face anti-spoofing based on general image quality
assessment, in Proc. IAPR Int. Conf. Pattern Recognit. (ICPR), pp. 1173-1178
Galbally J., Marcel S., Fierrez J., Feb. 2014, Image quality assessment for fake biometric
detection: Application to iris, fingerprint, and face recognition, IEEE Trans. Image
Process., Vol. 23, No. 2, pp. 710-724
Wen D., Han H., Jain A. K., Apr. 2015, Face spoof detection with image distortion
analysis, IEEE Trans. Inf. Forensics Security, Vol. 10, No. 4, pp. 746-761
Chingovska I., Anjos A., Marcel S., Sep. 2012, On the effectiveness of local binary
patterns in face anti-spoofing, in Proc. IEEE Int. Conf. Biometrics Special Interest
Group (BioSIG), pp. 1-7
de Freitas Pereira T., Anjos A., De Martino J. M., Marcel S., Nov. 2012, LBP-TOP based
countermeasure against face spoofing attacks, in Proc. Int. Workshop Comput. Vis.
Local Binary Pattern Variants (ACCV), pp. 121-132
Yang J., Lei Z., Liao S., Li S. Z., Jun. 2013, Face liveness detection with component
dependent descriptor, in Proc. IEEE Int. Conf. Biometrics (ICB), pp. 1-6
Chingovska I., Anjos A. R. D., Marcel S., Dec. 2014, Biometrics evaluation under spoofing
attacks, IEEE Trans. Inf. Forensics Security, Vol. 9, No. 12, pp. 2264-2276
Kim W., Suh S., Han J-J., Aug. 2015, Face liveness detection from a single image via
diffusion speed model, IEEE Trans. Image Process., Vol. 24, No. 8, pp. 2456-2465
Simonyan K., Zisserman A., Dec. 2015, Very deep convolutional networks for large-scale
image recognition, in Proc. Int. Conf. Learn., Represent., (ICLR), pp. 1-14
He K., Zhang X., Ren S., Sun J., Jun. 2016, Deep residual learning for image recognition,
in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770-778
Feng L., Po L. M., Li Y., Xu X., Yuan F., Cheung T. C. H., Cheung K.-W., Jul. 2016,
Integration of image quality and motion cues for face antispoofing: A neural network
approach, J. Vis. Commun. Image Represent., Vol. 38, pp. 451-460
Jourabloo A., Liu Y., Liu X., Sep. 2018, Face de-spoofing: anti-spoofing via noise
modeling, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 1-17
Feng H., Hong Z., Yue H., Chen Y., Wang K., Han J., Liu J., Ding E., 2020, Learning
generalized spoof cues for face antispoofing, arXiv preprint ar Xiv:2005.03922.
Yu Z., Li X., Niu X., Shi J., Zhao G., Aug. 2020, Face anti-spoofing with human material
perception, in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 1-19
Shao R., Lan X., Li J., Yuen P. C., Jun. 2019, Multi-adversarial discriminative deep
domain generalization for face presentation attack detection, in Proc. IEEE Int. Conf.
Comput. Vis. Pattern Recognit. (CVPR), pp. 10023-10031
Lim S., Gwak Y., Kim W., Roh J-H., Cho S., Dec. 2020, One-class learning method based
on live correlation loss for face anti-spoofing, IEEE Access, Vol. 8, pp. 201635-201648
George A., Marcel S., 2021, Learning one class representation for face presentation
attack detection using multi-channel convolutional neural networks, IEEE Trans. Inf.
Forensics Security, Vol. 16, No. 1, pp. 361-375
Ghiani L., Marcialis G., Roli F., Nov. 2012, Fingerprint liveness detection by local
phase quantization, in Proc. IAPR Int. Conf. on Pattern Recognit. (ICPR), pp. 537-540
Li L., Feng X., Boulkenafet Z., Xia Z., Li M., Hadid A., Dec. 2016, An original face
anti-spoofing approach using partial convolutional neural network, in Proc. Int. Conf.
Image Process. Theory, Tools Appl. (IPTA), pp. 1-6
Boulkenafet Z., Komulainen J., Li L., Feng X., Hadid A., May 2017, OULU-NPU: A mobile
face presentation attack database with real-world variations, in Proc. IEEE Int. Conf.
Autom. Face Gesture Recognit. (FG), pp. 612-618
Zhang S., Liu A., Wan J., Liang Y., Guo G., Escalera S., Escalante H. J., Li S. Z.,
Apr. 2020, CASIA-SURF: a large-scale multi-modal benchmark for face anti-spoofing,
IEEE Trans. Bio. Behavior Iden. Sci., Vol. 2, No. 2, pp. 182-193
2016, ,ISO/IEC JTC 1/SC 37 Biometrics. information technology biometric presentation
attack detection part 1: Framework., international organization for standardization
Zhang K., Zhang Z., Li Z., Qiao Y., Oct. 2016, Joint face detection and alignment
using multitask cascaded convolutional networks, IEEE Signal Process. Lett., Vol.
23, No. 10, pp. 1499-1503
Hu P., Ramanan D., Jun 2017, Finding tiny faces, in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), pp. 1522-1530
Li L., Xia Z., Hadid A., Jiang X., Zhang H., Feng X., Sep. 2019, Replayed video attack
detection based on motion blur analysis, IEEE Trans. Inf. Forensics Security, Vol.
14, No. 9, pp. 2246-2261
Wang Z., Zhao C., Qin Y., Zhou Q., Qi G., Wan J., Lei Z., 2018, Exploiting temporal
and depth information for multi-frame face anti-spoofing, arXiv preprint arXiv:1811.05118
Yang X., Luo W., Bao L., Gao Y., Gong D., Zheng S., Li Z., Lei W., Jun. 2019, Face
anti-spoofing: model matters, so does data, in Proc. IEEE Int. Conf. Comput. Vis.
Pattern Recognit. (CVPR), pp. 3507-3516
Author
Wonjun Kim received a B.S. degree from the Department of Electronic Engineering,
Sogang University, Seoul, South Korea, in 2006, an M.S. degree from the Department
of Information and Communications, Korea Advanced Institute of Science and Technology
(KAIST), Daejeon, South Korea, in 2008, and a Ph.D. degree from the Department of
Electrical Engineering, KAIST, in 2012. From September 2012 to February 2016, he was
a Research Staff Member of the Samsung Advanced Institute of Technology (SAIT), South
Korea. Since March 2016, he has been with the Department of Electrical and Electronics
Engineering, Konkuk University, Seoul, where he is currently an associate professor.
His research interests include image and video understanding, computer vision, pattern
recognition, and biometrics, with an emphasis on background subtraction, saliency
detection, face, and action recognition. He has served as a regular reviewer for over
30 international journal articles, including the IEEE Transactions on Image Processing,
IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions
on Multimedia, IEEE Transactions on Cybernetics, IEEE Access, IEEE Signal Processing,
Letters, and so on.