Paper: | SP-L11.5 | ||
Session: | Language Modeling and Search | ||
Time: | Friday, May 21, 16:50 - 17:10 | ||
Presentation: | Lecture | ||
Topic: | Speech Processing: Language Modeling | ||
Title: | CROSS-LINGUAL LATENT SEMANTIC ANALYSIS FOR LANGUAGE MODELING | ||
Authors: | Woosung Kim; Johns Hopkins University | ||
Sanjeev Khudanpur; Johns Hopkins University | |||
Abstract: | Statistical language model estimation requires large amounts of domain-specifictext, which is difficult to obtain in many languages. We propose techniqueswhich exploit domain-specific text in a resource-rich language to adapt alanguage model in a resource-deficient language. A primary advantage of ourtechnique is that in the process of cross-lingual language model adaptation, wedo not rely on the availability of any machine translation capability.Instead, we assume that only a modest-sized collection of story-aligneddocument-pairs in the two languages is available. We use ideas fromcross-lingual latent semantic analysis to develop a single low-dimensionalrepresentation shared by words and documents in both languages, which enablesus to (i) find documents in the resource-rich language pertaining to a specificstory in the resource-deficient language, and (ii) extract statistics from thepertinent documents to adapt a language model to the story of interest. Wedemonstrate significant reductions in perplexity and error rates in a Mandarinspeech recognition task using this technique. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops