Paper: | MLSP-P3.8 | ||
Session: | Speech and Audio Processing | ||
Time: | Wednesday, May 19, 15:30 - 17:30 | ||
Presentation: | Poster | ||
Topic: | Machine Learning for Signal Processing: Speech and Audio Processing Applications | ||
Title: | AUDIO-VISUAL GRAPHICAL MODELS FOR SPEECH PROCESSING | ||
Authors: | John Hershey; University of California, San Diego | ||
Hagai Attias; Microsoft Research | |||
Nebojša Jojic; Microsoft Research | |||
Trausti Kristjansson; Microsoft Research | |||
Abstract: | Perceiving sounds in a noisy environment is a challenging problem. Visual lip-reading can provide relevant information but is also challenging because lips are moving and a tracker must deal with a variety of conditions. Typically audio-visual systems have been assembled from individually engineered modules. We propose to fuse audio and video in a probabilistic generative model that implements cross-model self-supervised learning, enabling adaptation to audio-visual data. The video model features a Gaussian mixture model embedded in a linear subspace of a sprite which translates in he video. The system can learn to detect and enhance speech in noise given only a short sequence of audio-visual data. We show some results for speech enhancement, and discuss extensions to the model that are under investigation. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops