Technical Program

Paper Detail

Paper:	SP-P8.4
Session:	Voice Activity Detection and Speech Segmentation
Time:	Wednesday, May 19, 13:00 - 15:00
Presentation:	Poster
Topic:	Speech Processing: Speech Analysis
Title:	VOICE ACTIVITY DETECTION USING VISUAL INFORMATION
Authors:	Peng Liu; Tsinghua University
	Zuoying Wang; Tsinghua University
Abstract:	In traditional voice activity detection (VAD) approaches, some features of audio stream, for example frame-energy features, are used for voice decision. In this paper, we present a general framework of visual information based VAD approach in multi-modal system. Firstly, The Gauss mixture visual models of voice and non-voice are designed, and the decision rule is discussed in detail. Subsequently, the visual feature extraction method for VAD is investigated. The best visual feature structure and the best mixture number are selected experimentally. Our experiments show that using visual information based VAD, prominent reduction in frame error rate (31.1% relatively) is achieved, and the audio-visual stream can be segmented into sentences for recognition much more precisely (98.4% relative reduction in sentence break error rate), compared to frame-energy based approach in clean audio case. Furthermore, the performance of visual based VAD is independent of background noise.

Back

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004