Technical Program

Paper Detail

Paper:SP-P16.1
Session:Speech Modeling for Robust Speech Recognition
Time:Friday, May 21, 15:30 - 17:30
Presentation: Poster
Topic: Speech Processing: Robust Speech Recognition
Title: DBN BASED MULTI-STREAM MODELS FOR AUDIO-VISUAL SPEECH RECOGNITION
Authors: John Gowdy; Clemson University 
 Amarnag Subramanya; Clemson University 
 Chris Bartels; University of Washington 
 Jeff Bilmes; University of Washington 
Abstract: In this paper, we propose a model based on Dynamic Bayesian Networks (DBNs) to integrate information from multiple audio and visual streams. We also compare the DBN based system (implemented using the Graphical Model Toolkit (GMTK)) with a classical HMM (implemented in the Hidden Markov Model Toolkit (HTK)) for both the single and two stream integration problems. We also propose a new model (mixed integration) to integrate information from three or more streams derived from different modalities and compare the new model's performance with that of a synchronous integration scheme. A new technique to estimate stream confidence measures for the integration of three or more streams is also suggested. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicate an absolute improvement of about 4% in word accuracy at SNR of -4db for the mixed intergration models.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004