Paper: | SP-P11.2 | ||
Session: | Topics in Large Vocabulary Continuous Speech Recognition | ||
Time: | Thursday, May 20, 09:30 - 11:30 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Large Vocabulary Recognition/Search | ||
Title: | ADVANCES IN UNSUPERVISED AUDIO SEGMENTATION FOR THE BROADCAST NEWS AND NGSW CORPORA | ||
Authors: | Rongqing Huang; University of Colorado, Boulder | ||
John H. L. Hansen; University of Colorado, Boulder | |||
Abstract: | The problem of unsupervised audio segmentation continues to be a challenging research problem which significantly impacts Automatic Speech Recognition (ASR) and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audiosegmentation for unsupervised multi-speaker change detection. First, we investigate new features which are intended to be more appropriate for segmentation that include: PMVDR (Perceptual Minimum Variance Distortionless Response), SZCR ( Smoothed ZeroCrossing Rate), and FBLC (FilterBank Log Coefficients); next we consider a new distance metric, T2-mean which is intended to improve segmentation for short segments (<5s). A novel false alarm compensation procedure is also developed and used after thesegmentation phase. We establish a more effective evaluation procedure for segmentation versus the more traditional EER and Frame Accuracy approaches. Employing these advances within our new scheme, results in more than a 30% improvement in segmentation performance using the 3-hour Hub4 Broadcast news 1997 evaluation data. Evaluations are also presented for audio from the NGSW corpus. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops