Paper: | MLSP-P3.11 | ||
Session: | Speech and Audio Processing | ||
Time: | Wednesday, May 19, 15:30 - 17:30 | ||
Presentation: | Poster | ||
Topic: | Machine Learning for Signal Processing: Signal detection, Pattern Recognition and Classification | ||
Title: | MULTIBAND STATISTICAL LEARNING FOR F0 ESTIMATION IN SPEECH | ||
Authors: | Fei Sha; University of Pennsylvania | ||
Ashley Burgoyne; University of Pennsylvania | |||
Lawrence Saul; University of Pennsylvania | |||
Abstract: | We investigate a simple algorithm that combines multiband processing and least squares fits to estimate F0 contours in speech. The algorithm is untraditional in several respects: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably, in real time, without the need for postprocessing to produce smooth contours. We show that a baseline implementation of the algorithm, though already quite accurate, is significantly improved by incorporating a model of statistical learning into its final stages. Model parameters are estimated from training data to minimize the likelihood of gross errors in F0, as well as errors in classifying voiced versus unvoiced speech. Experimental results on several databases confirm the benefits of statistical learning. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops