This document provides documentation for the processes and issues in building a new voice in the Festival Speech Synthesis System. This covers the necessary stages in building a voice either in an already supported language or in a completely new language.
Although the task of building very high quality, general text-to-speech (TTS) voices is still a difficult one, with many open research questions, we believe the building of reasonable quality voices for many tasks can be done with the information provided within this document. A number of different languages and voices have already been implemented under Festival, including: US and UK English, Castillian and Mexican Spanish, German, Polish, Greek, Welsh Gaelic, and Basque. These were often built in a short time, some in a just few weeks, by people starting with only a little knowledge of speech synthesis. Although the quality varies, all have produced text-to-speech synthesizers capable of reading text, like online newspapers, at a level that native speakers can easily follow.
This document, and related scripts and examples, is often updated. You should check the latest status at http://www.festvox.org.
This document specifically offers
Note that this document is not a manual for the Festival Speech Synthesis System itself, and we assume that the user has access to the Festival system and the Edinburgh Speech Tools. Except where explicitly mentioned constructing voices can be done using these tools alone.
The latest details, and a full software distribution of the Festival Speech Synthesis System with related documents and resources, are available through the home page at http://www.cstr.ed.ac.uk/projects/festival.html
Go to the first, previous, next, last section, table of contents.