Go to the first, previous, next, last section, table of contents.

11 Final comments

This course has covered a number of specific tasks in the process of speech synthesis. The four major areas are

architecture
text processing
linguistic/prosodic processing
waveform synthesis

Each was covered in terms of their implementation within Festival.

Synthesis is defined for utterances which can be created from simple text or tokens (or whatever) and are filled out through modules. The modules fall into the three major processes of synthesis.

Text to speech is defined as the problem of tokenizing arbitrary text, chunking it into utterances and them applying the appropriate modules to render it as waveforms and play it.

Festival offers a flexible architecture where each of the parts may be fully parameterized and controlled allowing the desired effect.

Although the architecture of Festival may seem daunting at first, I hope you now see why it was designed as such so that it can be flexible enough for what ever we need to do.

Note that Festival is still being developed there are many things it will do in the future that it does not yet do. Some of these future enhancements are discussed above but others we will only find their necessity as we implement more parts of the system. Research and development cannot be mapped out fully and must be sensitive to problems found on the way.

These notes only give a brief overview of aspects of speech synthesis for further reading the following texts are recommend.

"An introduction to Text-to_Speech Synthesis" by Thierry Dutoit, Kluwer Academic Publishers 1997. This contains the most up to date description of the whole text to speech process.
"Test-to-speech: The MITalk system" by J Allen, M Hunnicut and D Klatt, Cambridge University Press. 1987. This book is a little old now but contains many details that people in speech synthesis should know about.
"Multilingual Text-to-Speech Synthesis: The Bell Labs Approach." eds Richard Sproat. This includes a detailed look at the Bell Labs system covering many aspects of the synthesis process and how they are solved within their system. It is a useful insight to many of the research aspects of the TTS process.
"Talking Machines" eds G. Bailly and C. Benoit. Horth-Holland, 1992. This book is a collection of papers from an ESCA Workshop in Speech Synthesis in Autrans 1990. It contains many useful papers about current synthesis research.
"Progress in Speech Synthesis" eds. J van Santen, R Sproat, J Olive and J Hirschberg. Springer Verlag 1996. This contains full length papers from the second ESCA Workshop on Speech Synthesis that was held in Mohonk, New York in 1994. Again a good collection of current work in the area.
There are also a number of conferences where synthesis papers are commonly presented. Eurospeech and ICSLP held every two years (alternately) contain the latest papers and are good places to meet people working in the field.
The comp.speech frequently asked questions (FAQ) by Andrew Hunt is always a good source of links ot information about synthesis and speech technology in general. It is regular updated. It can be found at any of the following addresses as well as being regularly posted to the USENET newsgroup comp.speech
```
Australia: http://www.speech.su.oz.au/comp.speech/ 
UK: http://svr-www.eng.cam.ac.uk/comp.speech/ 
Japan: http://www.itl.atr.co.jp/comp.speech/ 
USA: http://www.speech.cs.cmu.edu/comp.speech/ 
```
Finally you may follow the development of the Festival Speech Synthesis System through its home page
```
http://www.cstr.ed.ac.uk/projects/festival.html
```

Go to the first, previous, next, last section, table of contents.