Go to the first, previous, next, last section, table of contents.

11 Final comments

This course has covered a number of specific tasks in the process of speech synthesis. The four major areas are

  1. architecture
  2. text processing
  3. linguistic/prosodic processing
  4. waveform synthesis

Each was covered in terms of their implementation within Festival.

Synthesis is defined for utterances which can be created from simple text or tokens (or whatever) and are filled out through modules. The modules fall into the three major processes of synthesis.

Text to speech is defined as the problem of tokenizing arbitrary text, chunking it into utterances and them applying the appropriate modules to render it as waveforms and play it.

Festival offers a flexible architecture where each of the parts may be fully parameterized and controlled allowing the desired effect.

Although the architecture of Festival may seem daunting at first, I hope you now see why it was designed as such so that it can be flexible enough for what ever we need to do.

Note that Festival is still being developed there are many things it will do in the future that it does not yet do. Some of these future enhancements are discussed above but others we will only find their necessity as we implement more parts of the system. Research and development cannot be mapped out fully and must be sensitive to problems found on the way.

These notes only give a brief overview of aspects of speech synthesis for further reading the following texts are recommend.

Go to the first, previous, next, last section, table of contents.