|   CMU Speech Software   |   CMU Speech Group   |  

Home
Document
FestVox Download
Festival Download
Voice Demos

TRANSFORM

Example Databases
Mailing Lists
Search Documents
Contributed parts
Links
Contact

TRANSFORM: flexible voice synthesis through articulatory voice transformation

TRANSFORM

Contact: Alan W Black
or Arthur Toth

We have always wanted our machines to talk to us, but most people have strong preferences for particular voices. Current techniques in speech synthesis can build voices that sound very close to the original speaker, capturing the style, manner and articulation of the source voice. However such systems require many hours of carefully recorded speech and expert tuning to reach an acceptable level of quality.

An exciting new alternative method for building synthetic voices is voice transformation. Here we use an exsisting recorded database and convert it to a target voice using as little as 10-20 sentences. These techniques offer the potential to make speech synthesizers talk in whatever voice we desire, with significantly less effort required than previous techniques.

This project offers a new direction in voice transformation. Current transformation techniques concentrate on a spectral mapping of the voice, i.e. converting the properties of the speech signal. Instead we can use the underlying positions of the vocal tract articulators (i.e. the position of the teeth, tongue, lips, velum) which give rise to the spectral output of the voice.

Using new statistical modeling techniques we can successfully predict the positions of a speaker's articulators from the speech signal. Then in the virtual vocal tract domain map between speakers and regenerate the speech for the target voice.

This work enables the easy construction of new synthetic voices allowing personalization of speech output. It increases our knowledge of the speech generation process and characterizes what make a voice personal.
Voice Transformation Publications

  • Toth, A., and Black, A., (2005) Cross-Speaker Articulatory Position Data for Phonetic Feature Prediction Interspeech 2005, Lisbon, Portugal. (pdf)
  • Toda, T., Black, A., and Tokuda, K. (2005) Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter ICASSP, Philadelphia, Pennsylvania. (pdf)
  • Toda, T., and Black, A., and Tokuda, K. (2004) Acoustic-to-Articulatory Inversion Mapping with Gaussian Mixture Model, ICSLP2004, Jeju, Korea, (pdf)
  • Toda, T., and Black, A., and Tokuda, K. (2004) Acoustic-to-Articulatory Inversion Mapping with Gaussian Mixture Model, ICSLP2004, Jeju, Korea, (pdf)
CMU/LTI This page is maintained by Alan W Black (awb@cs.cmu.edu)
Festvox is a project within LTI at Carnegie Mellon University