Technical Program

Paper Detail

Paper:	MLSP-P3.11
Session:	Speech and Audio Processing
Time:	Wednesday, May 19, 15:30 - 17:30
Presentation:	Poster
Topic:	Machine Learning for Signal Processing: Signal detection, Pattern Recognition and Classification
Title:	MULTIBAND STATISTICAL LEARNING FOR F0 ESTIMATION IN SPEECH
Authors:	Fei Sha; University of Pennsylvania
	Ashley Burgoyne; University of Pennsylvania
	Lawrence Saul; University of Pennsylvania
Abstract:	We investigate a simple algorithm that combines multiband processing and least squares fits to estimate F0 contours in speech. The algorithm is untraditional in several respects: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably, in real time, without the need for postprocessing to produce smooth contours. We show that a baseline implementation of the algorithm, though already quite accurate, is significantly improved by incorporating a model of statistical learning into its final stages. Model parameters are estimated from training data to minimize the likelihood of gross errors in F0, as well as errors in classifying voiced versus unvoiced speech. Experimental results on several databases confirm the benefits of statistical learning.

Back

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004