An improved normalized gain-based score normalization technique for spoof detection algorithm
Keywords:spoof detection, speaker verification, score normalization, replay attack, voice conversion, speech processing
A spoof detection algorithm supports the speaker verification system to examine the false claims by an imposter through careful analysis of input test speech. The scores are employed to categorize the genuine and spoofed samples effectively. Under the mismatch conditions, the false acceptance ratio increases and can be reduced by appropriate score normalization techniques. In this article, we are using the normalized Discounted Cumulative Gain (nDCG) norm derived from ranking the speaker’s log-likelihood scores. The proposed scoring technique smoothens the decaying process due to logarithm with an added advantage from the ranking. The baseline spoof detection system employs Constant Q-Cepstral Co-efficient (CQCC) as the base features with a Gaussian Mixture Model (GMM) based classifier. The scores are computed using the ASVspoof 2019 dataset for normalized and without normalization conditions. The baseline techniques including the Zero normalization (Z-norm) and Test normalization (T-norm) are also considered. The proposed technique is found to perform better in terms of improved Equal Error Rate (EER) of 0.35 as against 0.43 for baseline system (no normalization) wrt to synthetic attacks using development data. Similarly, improvements are seen in the case of replay attack with EER of 7.83 for nDCG-norm and 9.87 with no normalization (no-norm). Furthermore, the tandem-Detection Cost Function (t-DCF) scores for synthetic attack are 0.015 for no-norm and 0.010 for proposed normalization. Additionally, for the replay attack the t-DCF scores are 0.195 for no-norm and 0.17 proposed normalization. The system performance is satisfactory when evaluated using evaluation data with EER of 8.96 for nDCG-norm as against 9.57 with no-norm for synthetic attacks while the EER of 9.79 for nDCG-norm as against 11.04 with no-norm for replay attacks. Supporting the EER, the t-DCF for nDCG-norm is 0.1989 and for no-norm is 0.2636 for synthetic attacks; while in case of replay attacks, the t-DCF is 0.2284 for the nDCG-norm and 0.2454 for no-norm. The proposed scoring technique is found to increase spoof detection accuracy and overall accuracy of speaker verification system.