Technical Program

Paper Detail

Paper:MSP-P1.1
Session:Human Machine Interface; Signal Processing for Media Integration and Application
Time:Friday, May 21, 09:30 - 11:30
Presentation: Poster
Topic: Multimedia Signal Processing: Human-Machine Interface
Title: IMPROVED FACE AND FEATURE FINDING FOR AUDIO-VISUAL SPEECH RECOGNITION IN VISUALLY CHALLENGING ENVIRONMENTS
Authors: Jintao Jiang; University of California, Los Angeles 
 Gerasimos Potamianos; IBM T. J. Watson Research Center 
 Harriet J. Nock; IBM T. J. Watson Research Center 
 Giridharan Iyengar; IBM T. J. Watson Research Center 
 Chalapathy Neti; IBM T. J. Watson Research Center 
Abstract: Visual information in a speaker’s face is known to improve robustness of automatic speech recognizers. However, most studies in audio-visual ASR have focused on “visually clean” data to benefit ASR in noise. This paper is a follow up on a previous study that investigated audio-visual ASR in visually challenging environments. It focuses on visual speech front end processing, and it proposes an improved, appearance based face and feature detection algorithm that utilizes Gaussian mixture model classifiers. This method is shown to improve the accuracy of face and feature detection, and thus visual speech recognition, over our previously used baseline system. In turn, this translates to improved audio-visual ASR, resulting in a 10% relative reduction of the word-error-rate in noisy speech.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004