Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

Hema Kumar Pentapati; Sridevi K

doi:10.32985/ijeces.14.3.8

Authors

Hema Kumar Pentapati Department of Electrical Electronics and Communication Engineering, GITAM School of Technology Visakhapatnam-530045, India https://orcid.org/0000-0002-9373-9132
Dr. K. Sridevi Department of Electrical Electronics and Communication Engineering GITAM School of Technology Visakhapatnam-530045, India https://orcid.org/0000-0002-6716-6705

DOI:

https://doi.org/10.32985/ijeces.14.3.8

Keywords:

Log-MelSpectrum, MFCC, Speaker Identification, excitation features, Convolution Neural Network(CNN), LP Residual, deep learning, Deep Neural Network

Abstract

There are various challenges in identifying the speakers accurately. The Extraction of discriminative features is a vital task for accurate identification in the speaker identification task. Nowadays, speaker identification is widely investigated using deep learning. The complex and noisy speech data affects the performance of Mel Frequency Cepstral Coefficients (MFCC); hence, MFCC fails to represent the speaker characteristics accurately. In this proposed work, a novel text-independent speaker identification system is developed to enhance the performance by fusion of Log-MelSpectrum and excitation features. The excitation information is obtained due to the vibration of vocal folds, and it is represented using Linear Prediction (LP) residual. The various types of features extracted from the excitation are residual phase, sharpness, Energy of Excitation (EoE), and Strength of Excitation (SoE). The extracted features were processed with the dilated convolution neural network (dilated CNN) to fulfill the identification task. The extensive evaluation showed that the fusion of excitation features gives better results than the existing methods. The accuracy reaches 94.12% for 11 complex classes and 91.34% for 80 speakers, and Equal Error Rate (EER) is reduced to 1.16% for the proposed model. The proposed model is tested with the Librispeech corpus using Matlab 2021b tool, outperforming the existing baseline models. The proposed model achieves an accuracy improvement of 1.34% compared to the baseline system.

Enhancement in Speaker Identification through Feature Fusion using Advanced Dilated Convolution Neural Network

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Make a Submission

JCR Impact factor for 2024

0.9