Paper: | SP-P13.4 | ||
Session: | General Topics in Robust Speech Recognition | ||
Time: | Thursday, May 20, 13:00 - 15:00 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Robust Speech Recognition | ||
Title: | A STREAM-WEIGHT OPTIMIZATION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION USING MULTI-STREAM HMMS | ||
Authors: | Satoshi Tamura; Tokyo Institute of Technology | ||
Koji Iwano; Tokyo Institute of Technology | |||
Sadaoki Furui; Tokyo Institute of Technology | |||
Abstract: | For multi-stream HMMs that are widely used in audio-visual speech recognition, it is important to automatically and properly adjust stream weights. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. In our audio-visual speech recognition system, video signals are captured and converted into visual features using HMM-based techniques. Extracted acoustic and visual features are concatenated into an audio-visual vector. A multi-stream HMM is obtained from audio and visual HMMs. Experiments are conducted using Japanese connected digit speech recorded in real-world environments. Applying the MLLR (maximum likelihood linear regression) adaptation and our optimization method, we achieve a 29% absolute accuracy improvement and a 76% relative error rate reduction compared with the audio-only scheme. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops