|   CMU Speech Software   |   CMU Speech Group   |  

Home
Document
FestVox Download
Flite Download/Demos
Festival Download
Blizzard Challenge
Voice Transformation
Voice Demos
Limited Domain
Example Databases
Mailing Lists
Contributed parts
Links
Contact

5-day Course on Building Voices

Overview

July 31st-August 4th, 2000, at Carnegie Mellon Univeristy in Pittsburgh, Pennsylvannia.

This 5-day course will allow attendees to gain practice in building synthetic voices for speech applications. This course is aimed as developers of speech systems who wish to take better advantage of start of the art synthesis techniques.

The number of attendees to this course will be limited. Course fees will be $2,500 per person. Accommodation and meals are extra. To register please contact Kevin A. Lenzo (lenzo@cs.cmu.edu)

Attendees of the course will:

  • gain understanding the basic components of state-of-the-art speech synthesis technology, and their relative complexity.
  • gain practical experience in building new voices and trade-offs between general TTS (diphone, general unit selection) synthesizers and targeted and limited domain synthesizers
  • gain practical experience in tailoring voices and their applications to get the best compromise of quality, speed and ease of construction
This course is based on the FestVox Document and uses Edinburgh University's Festival Speech Synthesis System in all practicals. As these tools are free for commercial use, techniques learned in this course may be applied to building new commercial voices without further licencing.
Day one Basic synthesis techniques, history and future
The Festival Speech Synthesis System and its usage
Overview of building process and simple example
Day two Key synthesis components: text analysis, lexicons, linguistic processing and waveform synthesis
Customizing TTS for applications: tts_modes, markup etc
Day three Building new diphone voices
Recording, labelling and corrections
A larger example
Day four Unit selection synthesis
Building limited domain voices
Day five Tuning, testing and correcting voices
Future techniques

The course will involve lectures and practicals including complete walkthroughs for building your own voices so the attendees can fully gain experience in actually building voices. Each attendee will be given access to a computer during the course. Voices built using these techniques can be run on any platform Festival supports, which includes, Linux, Solaris, Windows etc. However the course will use Linux workstations to demonstrate the use of the voice building tools.

Presenters

The course will be taught and supervised by Alan W Black and Kevin A. Lenzo.

Alan W Black is a principal author of the Festival Speech Synthesis System and has had many years experience in designing and building various speech synthesis systems in voices in various languagues, both in academia and industry. Previous to that he has worked on a wide range of speech and language research projects all leading to practical implementations.

Kevin A. Lenzo. also has many years experience in synthesis systems, both in academia and industry, working in a wide range of synthesis systems, including the development of PhoneBox, a multi-lingual unit selection synthesizer. He is also a respected member of Perl Programming Language community, active in the open source lincencing movement and currently steward of the CMU Sphinx open source speech recognition project.

Both are committed to open source, and the transfer of technology from research to practical applications. Together they have authored the FestVox Document, been part of building many voices in a number of different languages, and developed the technology to make the building of synthetic voices, better, more reliable, and available to a much larger community.

CMU/LTI This page is maintained by Alan W Black (awb@cs.cmu.edu)
Festvox is a project within LTI at Carnegie Mellon University