Technical Program

Paper Detail

Session:Language Modeling and Search
Time:Friday, May 21, 16:50 - 17:10
Presentation: Lecture
Topic: Speech Processing: Language Modeling
Authors: Woosung Kim; Johns Hopkins University 
 Sanjeev Khudanpur; Johns Hopkins University 
Abstract: Statistical language model estimation requires large amounts of domain-specifictext, which is difficult to obtain in many languages. We propose techniqueswhich exploit domain-specific text in a resource-rich language to adapt alanguage model in a resource-deficient language. A primary advantage of ourtechnique is that in the process of cross-lingual language model adaptation, wedo not rely on the availability of any machine translation capability.Instead, we assume that only a modest-sized collection of story-aligneddocument-pairs in the two languages is available. We use ideas fromcross-lingual latent semantic analysis to develop a single low-dimensionalrepresentation shared by words and documents in both languages, which enablesus to (i) find documents in the resource-rich language pertaining to a specificstory in the resource-deficient language, and (ii) extract statistics from thepertinent documents to adapt a language model to the story of interest. Wedemonstrate significant reductions in perplexity and error rates in a Mandarinspeech recognition task using this technique.

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: -||- Last updated Wednesday, April 07, 2004