Investigation of Self-attention based Architectures for Speaker Verification

  • 28



Name of the Speaker: Metilda Sagaya Mary N J (EE18D013)
Name of the Guide: Dr. S. Umesh
Date/Time: 28th November 2022, 3.00 pm

Speaker Verification uses speech as a biometric to verify the identity claimed by a speaker. Using speech for identity verification is very convenient and can be used along with other verification techniques to improve the verification performance. Speaker Verification with speaker embeddings obtained from trained Neural Networks is very popular. X-vectors and R-vectors are well known speaker embeddings. They are obtained from Time-Delay Neural Networks and Residual Neural Networks. These networks suffer from slow building of context with layers. So self-attention based networks which can capture global context in all of its layers can be explored for getting better speaker embeddings. Once the embeddings are obtained, Probabilistic Linear Discriminant Analysis (PLDA) is done conventionally to score the embeddings of a trial pair. PLDA is not a Neural Network based technique. So there is a lot of scope to explore Neural Network based architectures for scoring a trial pair.

In this talk, we will discuss the performance of self-attention based architectures, which take global context into consideration in all of its layers for generating speaker embeddings and trial pair scoring.