TRANSFORM
Contact: Alan W Black
or Arthur Toth
We have always wanted our machines to talk to us, but most people
have strong preferences for particular voices. Current techniques in
speech synthesis can build voices that sound very close to the
original speaker, capturing the style, manner and articulation of the
source voice. However such systems require many hours of carefully
recorded speech and expert tuning to reach an acceptable level of
quality.
An exciting new alternative method for building synthetic voices is
voice transformation. Here we use an exsisting recorded database and
convert it to a target voice using as little as 10-20 sentences.
These techniques offer the potential to make speech synthesizers talk
in whatever voice we desire, with significantly less effort required
than previous techniques.
This project offers a new direction in voice transformation. Current
transformation techniques concentrate on a spectral mapping of the
voice, i.e. converting the properties of the speech signal. Instead
we can use the underlying positions of the vocal tract articulators
(i.e. the position of the teeth, tongue, lips, velum) which give rise
to the spectral output of the voice.
Using new statistical modeling techniques we can successfully predict
the positions of a speaker's articulators from the speech signal. Then
in the virtual vocal tract domain map between speakers and regenerate
the speech for the target voice.
This work enables the easy construction of new synthetic voices
allowing personalization of speech output. It increases our knowledge
of the speech generation process and characterizes what make a voice
personal.
Voice Transformation Publications
|
- Toth, A., and Black, A., (2005)
Cross-Speaker Articulatory Position Data for Phonetic Feature Prediction
Interspeech 2005, Lisbon, Portugal.
(pdf)
- Toda, T., Black, A., and Tokuda, K. (2005)
Spectral Conversion Based on Maximum Likelihood Estimation
Considering Global Variance of Converted Parameter
ICASSP, Philadelphia, Pennsylvania.
(pdf)
- Toda, T., and Black, A., and Tokuda, K. (2004)
Acoustic-to-Articulatory Inversion Mapping with Gaussian
Mixture Model,
ICSLP2004, Jeju, Korea,
(pdf)
- Toda, T., and Black, A., and Tokuda, K. (2004)
Acoustic-to-Articulatory Inversion Mapping with Gaussian
Mixture Model,
ICSLP2004, Jeju, Korea,
(pdf)
|