Paper: | SP-P9.14 | ||
Session: | Topics in Speech Synthesis | ||
Time: | Wednesday, May 19, 15:30 - 17:30 | ||
Presentation: | Poster | ||
Topic: | Speech Processing: Speech Synthesis (including TTS) | ||
Title: | SCALING OF WAVEFORM SEGMENTS ALONG THE TIME AXIS FOR CONCATENATIVE SPEECH SYNTHESIS | ||
Authors: | Nobuyuki Nishizawa; ATR, Spoken Language Translation Laboratories | ||
Hisashi Kawai; ATR, Spoken Language Translation Laboratories | |||
Abstract: | Waveform scaling along the time axis is introduced as a pitch and duration conversion method for concatenative speech synthesis. With this method, although not only F0 and duration but also spectrum are affected, no degradation of naturalness is caused when the scaling ratio is nearly 1. In corpus-based concatenative speech synthesis, when there are many segment candidates with various F0 values or durations, excessive scaling may be unnecessary. The result of experiments indicated that the difference in F0 and duration between the target and a selected segment became smaller. However, it also showed that the conventional cost function in selection cannot represent the degradation of naturalness by spectral distortion, and that scaling range without the degradation may not be enough for the pitch conversion required in our synthesizer. These problems should be improved by wider range scaling with a new cost function that also considers the degradation. | ||
Back |
Home -||-
Organizing Committee -||-
Technical Committee -||-
Technical Program -||-
Plenaries
Paper Submission -||-
Special Sessions -||-
ITT -||-
Paper Review -||-
Exhibits -||-
Tutorials
Information -||-
Registration -||-
Travel Insurance -||-
Housing -||-
Workshops