Paper: | SP-P16.1 | ||
Session: | Speech Modeling for Robust Speech Recognition | ||
Time: | Friday, May 21, 15:30 - 17:30 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Robust Speech Recognition | ||
Title: | DBN BASED MULTI-STREAM MODELS FOR AUDIO-VISUAL SPEECH RECOGNITION | ||
Authors: | John Gowdy; Clemson University | ||
Amarnag Subramanya; Clemson University | |||
Chris Bartels; University of Washington | |||
Jeff Bilmes; University of Washington | |||
Abstract: | In this paper, we propose a model based on Dynamic Bayesian Networks (DBNs) to integrate information from multiple audio and visual streams. We also compare the DBN based system (implemented using the Graphical Model Toolkit (GMTK)) with a classical HMM (implemented in the Hidden Markov Model Toolkit (HTK)) for both the single and two stream integration problems. We also propose a new model (mixed integration) to integrate information from three or more streams derived from different modalities and compare the new model's performance with that of a synchronous integration scheme. A new technique to estimate stream confidence measures for the integration of three or more streams is also suggested. Results from our implementation using the Clemson University Audio Visual Experiments (CUAVE) database indicate an absolute improvement of about 4% in word accuracy at SNR of -4db for the mixed intergration models. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops