Paper: | SP-P8.4 | ||
Session: | Voice Activity Detection and Speech Segmentation | ||
Time: | Wednesday, May 19, 13:00 - 15:00 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Speech Analysis | ||
Title: | VOICE ACTIVITY DETECTION USING VISUAL INFORMATION | ||
Authors: | Peng Liu; Tsinghua University | ||
Zuoying Wang; Tsinghua University | |||
Abstract: | In traditional voice activity detection (VAD) approaches, some features of audio stream, for example frame-energy features, are used for voice decision. In this paper, we present a general framework of visual information based VAD approach in multi-modal system. Firstly, The Gauss mixture visual models of voice and non-voice are designed, and the decision rule is discussed in detail. Subsequently, the visual feature extraction method for VAD is investigated. The best visual feature structure and the best mixture number are selected experimentally. Our experiments show that using visual information based VAD, prominent reduction in frame error rate (31.1% relatively) is achieved, and the audio-visual stream can be segmented into sentences for recognition much more precisely (98.4% relative reduction in sentence break error rate), compared to frame-energy based approach in clean audio case. Furthermore, the performance of visual based VAD is independent of background noise. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops