Technical Program

Paper Detail

Paper:SP-P11.2
Session:Topics in Large Vocabulary Continuous Speech Recognition
Time:Thursday, May 20, 09:30 - 11:30
Presentation: Poster
Topic: Speech Processing: Large Vocabulary Recognition/Search
Title: ADVANCES IN UNSUPERVISED AUDIO SEGMENTATION FOR THE BROADCAST NEWS AND NGSW CORPORA
Authors: Rongqing Huang; University of Colorado, Boulder 
 John H. L. Hansen; University of Colorado, Boulder 
Abstract: The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audiosegmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (Perceptual Minimum Variance Distortionless Response), SZCR ( Smoothed ZeroCrossing Rate), and FBLC (FilterBank Log Coefficients); next we consider a new distance metric, T2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after thesegmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and Frame Accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 Broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004