Technical Program

Paper Detail

Paper:SS-2.2
Session:Multi-Sensory Processing for Context-Aware Computing
Time:Tuesday, May 18, 13:20 - 13:40
Presentation: Special Session Lecture
Topic: Special Sessions: Multi-sensory Processing for Context-Aware Computing
Title: TOWARDS PRACTICAL DEPLOYMENT OF AUDIO-VISUAL SPEECH RECOGNITION
Authors: Gerasimos Potamianos; IBM T. J. Watson Research Center 
 Chalapathy Neti; IBM T. J. Watson Research Center 
 Jing Huang; IBM T. J. Watson Research Center 
 Jonathan H. Connell; IBM T. J. Watson Research Center 
 Stephen Chu; IBM T. J. Watson Research Center 
 Vit Libal; IBM T. J. Watson Research Center 
 Etienne Marcheret; IBM T. J. Watson Research Center 
 Norman Haas; IBM T. J. Watson Research Center 
 Jintao Jiang; House Ear Institute 
Abstract: Much progress has been achieved during the past two decades in audio-visual automatic speech recognition (AVASR). However, challenges persist that hinder AVASR deployment in practical situations, most notably, robust and fast extraction of visual speech features. We review our effort in overcoming this problem,based on an appearance-based visual feature representation of the speaker's mouth region. In particular: (a) We discuss AVASR in realistic, visually challenging domains, where lighting, background, and head-pose vary significantly. To enhance visual-front-end robustness in such environments, we employ an improved statistical-based face detection algorithm, that significantly outperforms our baseline scheme. However, visual-only recognitionremains inferior to visually ``clean'' (studio-like) data, thus demonstrating the importance of accurate mouth region extraction. (b) We then consider a wearable audio-visual sensorto directly capture the mouth region, thus eliminating face detection. Its use improves visual-only recognition, even over full-face videos recorded in the studio-like environment. (c) Finally, we address the speed issue in visual feature extraction, by discussing our real-time AVASR prototype implementation. The reported progress demonstrates the feasibility of practical AVASR.
 
           Back


Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004