Technical Program

Paper Detail

Paper:	SP-P9.14
Session:	Topics in Speech Synthesis
Time:	Wednesday, May 19, 15:30 - 17:30
Presentation:	Poster
Topic:	Speech Processing: Speech Synthesis (including TTS)
Title:	SCALING OF WAVEFORM SEGMENTS ALONG THE TIME AXIS FOR CONCATENATIVE SPEECH SYNTHESIS
Authors:	Nobuyuki Nishizawa; ATR, Spoken Language Translation Laboratories
	Hisashi Kawai; ATR, Spoken Language Translation Laboratories
Abstract:	Waveform scaling along the time axis is introduced as a pitch and duration conversion method for concatenative speech synthesis. With this method, although not only F0 and duration but also spectrum are affected, no degradation of naturalness is caused when the scaling ratio is nearly 1. In corpus-based concatenative speech synthesis, when there are many segment candidates with various F0 values or durations, excessive scaling may be unnecessary. The result of experiments indicated that the difference in F0 and duration between the target and a selected segment became smaller. However, it also showed that the conventional cost function in selection cannot represent the degradation of naturalness by spectral distortion, and that scaling range without the degradation may not be enough for the pitch conversion required in our synthesizer. These problems should be improved by wider range scaling with a new cost function that also considers the degradation.

Back

Home -||- Organizing Committee -||- Technical Committee -||- Technical Program -||- Plenaries
Paper Submission -||- Special Sessions -||- ITT -||- Paper Review -||- Exhibits -||- Tutorials
Information -||- Registration -||- Travel Insurance -||- Housing -||- Workshops

©2015 Conference Management Services, Inc. -||- email: webmaster@icassp2004.org -||- Last updated Wednesday, April 07, 2004