Mobile QR Code
Title Synergy in Voice and Lip Movement for Automatic Person Recognition
Authors Sumita Nainan; Vaishali Kulkarni
Page pp.279-289
ISSN 2287-5255
Keywords Multimodal biometrics; MFCC; HOG; VQ; GMM
Abstract Biometric systems for automatic person recognition (APR) require stringent accuracy parameters. Unimodal systems based on single biometric traits, however, have issues with limited flexibility, noisy data, and intra-class variations, and they are prone to spoof attacks. They hence do not yield the desired accuracy. Multimodal biometric systems compensate for these limitations to a certain extent. This paper attempts to present a novel multimodal biometric framework by combining text-independent voice with the accompanied lip movements as the two biometric traits.
Mel frequency cepstral coefficient (MFCC) features are extracted from the voice modality, and a histogram of oriented gradients (HOG) is extracted from the lip movements. Vector quantization (VQ) and the Gaussian mixture model (GMM) are the classifiers employed to create models of the training data. Open-set and closed-set identification techniques are implemented to obtain the APR performance parameters for individual traits. The summation rule (sum rule) was applied to the decisions obtained for individual traits in order to implement a multimodal biometric framework.
An accuracy of 98% for the APR was achieved, which is an improvement of almost 15% on the results obtained using individual biometric traits.