Technical Program

Paper Detail

Paper:	SP-L11.5
Session:	Language Modeling and Search
Time:	Friday, May 21, 16:50 - 17:10
Presentation:	Lecture
Topic:	Speech Processing: Language Modeling
Title:	CROSS-LINGUAL LATENT SEMANTIC ANALYSIS FOR LANGUAGE MODELING
Authors:	Woosung Kim; Johns Hopkins University
	Sanjeev Khudanpur; Johns Hopkins University
Abstract:	Statistical language model estimation requires large amounts of domain-specifictext, which is difficult to obtain in many languages. We propose techniqueswhich exploit domain-specific text in a resource-rich language to adapt alanguage model in a resource-deficient language. A primary advantage of ourtechnique is that in the process of cross-lingual language model adaptation, wedo not rely on the availability of any machine translation capability.Instead, we assume that only a modest-sized collection of story-aligneddocument-pairs in the two languages is available. We use ideas fromcross-lingual latent semantic analysis to develop a single low-dimensionalrepresentation shared by words and documents in both languages, which enablesus to (i) find documents in the resource-rich language pertaining to a specificstory in the resource-deficient language, and (ii) extract statistics from thepertinent documents to adapt a language model to the story of interest. Wedemonstrate significant reductions in perplexity and error rates in a Mandarinspeech recognition task using this technique.

Back

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004