Paper: | SP-P14.9 | ||
Session: | Acoustic Modeling: Tone, Prosody, and Features | ||
Time: | Thursday, May 20, 15:30 - 17:30 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Acoustic Modeling for Speech Recognition | ||
Title: | PARSING SPEECH INTO ARTICULATORY EVENTS | ||
Authors: | Kadri Hacioglu; University of Colorado, Boulder | ||
Bryan Pellom; University of Colorado, Boulder | |||
Wayne Ward; University of Colorado, Boulder | |||
Abstract: | In this paper, the state of speech production is defined by a number of categorical articulatory features. We describe a detector that outputs a stream (sequence of classes) for each articulatory feature given the Mel frequency cepstral coefficients (MFCCs) representation of the input speech. The detector consists of a bank of recurrent neural network (RNN) classifiers, a dynamic N-best lattice generator and the Viterbi decoder. A bank of classifiers has been previously used for the articulatory feature detection by many researchers. We extend their work first by creating dynamic N-best lattices for each feature and then by combining them into product lattices for rescoring using the Viterbi algorithm. During the rescoring we incorporate language and duration constraints along with the posterior probabilities of classes provided by the RNN classifiers. We present our results using the TIMIT data for place and manner features, and compare the results to a baseline system. We report performance improvements both at the frame and segment levels. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops