Callhome diarization recipe using x-vectors
TODO
Pretrained Model
Pretrained model to be uploaded on kaldi-asr.org.
Files list
./
README.txt This file
run.sh The recipe that was in egs/callhome_diarization/v2/run.sh
local/nnet3/xvector/tuning/
run_xvector_1a.sh Generated the configs, egs, and trained the model
conf/
vad.conf Energy VAD configration
mfcc.conf MFCC configuration
exp/xvector_nnet_1a/
final.raw The pretrained model
nnet.config An nnet3 config file for instantiating the model
extract.config An nnet3 config file for extracting xvectors
min_chunk_size Min chunk size used (see extract_xvectors.sh)
max_chunk_size Max chunk size used (see extract_xvectors.sh)
srand The RNG seed used
exp/xvectors_callhome1/
mean.vec Vector for centering, from callhome1
transform.mat Whitening matrix, trained on callhome1
plda PLDA model for callhome1, trained on SRE data
exp/xvectors_callhome2/
mean.vec Vector for centering, from callhome2
transform.mat Whitening matrix, trained on callhome2
plda PLDA model for callhome1, trained on SRE data
Training Data
The xvector DNN was trained on the following corpora:
Corpus LDC Catalog No.
SRE2004 LDC2006S44
SRE2005 Train LDC2011S01
SRE2005 Test LDC2011S04
SRE2006 Train LDC2011S09
SRE2006 Test 1 LDC2011S10
SRE2006 Test 2 LDC2012S01
SRE2008 Train LDC2011S05
SRE2008 Test LDC2011S08
SWBD2 Phase 2 LDC99S79
SWBD2 Phase 3 LDC2002S06
SWBD Cellular 1 LDC2001S13
SWBD Cellular 2 LDC2004S07
The following datasets were used in data augmentation.
MUSAN http://www.openslr.org/17
RIR_NOISES http://www.openslr.org/28
Results
The models should produce results similar to the following on Callhome. The acoustic ivector system is included for reference (see egs/callhome_diarization/v1).
xvector DER: 8.39% with supervised calibration, 7.12% with oracle number of speakers
ivector (from ../v1) DER: 10.36% with supervised calibration, 8.69% with oracle number of speakers
Citation
If you want to use the pretrained model in a paper, please cite as:
@inproceedings{snyder2018xvector,
title={X-vectors: Robust DNN Embeddings for Speaker Recognition},
author={Snyder, D. and Garcia-Romero, D. and Sell, G. and Povey, D. and Khudanpur, S.},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2018},
organization={IEEE},
url={http://www.danielpovey.com/files/2018_icassp_xvectors.pdf}
}