||Synergy in Voice and Lip Movement for Automatic Person Recognition
||Sumita Nainan; Vaishali Kulkarni
|| Multimodal biometrics; MFCC; HOG; VQ; GMM
||Biometric systems for automatic person recognition (APR) require stringent accuracy parameters. Unimodal systems based on single biometric traits, however, have issues with limited flexibility, noisy data, and intra-class variations, and they are prone to spoof attacks. They hence do not yield the desired accuracy. Multimodal biometric systems compensate for these limitations to a certain extent. This paper attempts to present a novel multimodal biometric framework by combining text-independent voice with the accompanied lip movements as the two biometric traits.
Mel frequency cepstral coefficient (MFCC) features are extracted from the voice modality, and a histogram of oriented gradients (HOG) is extracted from the lip movements. Vector quantization (VQ) and the Gaussian mixture model (GMM) are the classifiers employed to create models of the training data. Open-set and closed-set identification techniques are implemented to obtain the APR performance parameters for individual traits. The summation rule (sum rule) was applied to the decisions obtained for individual traits in order to implement a multimodal biometric framework.
An accuracy of 98% for the APR was achieved, which is an improvement of almost 15% on the results obtained using individual biometric traits.