Go to the first, previous, next, last section, table of contents.

3 Festival Overview

Festival has been built to address a number of specific issues in speech synthesis research and development. The biggest problem with speech synthesis development is that in order to improve one part you need all the other parts of the system before you can test your new development. Festival is specifically designed to provide an environment where you may develop your own small part and use the other already written modules without having to start from scratch. It also allows you to test multiple theories in exactly the same environment, something that is important in evaluating research.

3.1 Users of Festival

There are three specific types of users we are aiming at:

multi-lingual text to speech: for those who have little interest in the internal workings of the system, and just want speech output.
Synthesis for language system: for applications that generate text from known forms. In this type of system perhaps telephone numbers, addresses, etc. can be explicitly marked, language type, even intonational forms can be specified. This form of access requires more knowledge about the synthesis internals but still not its low level details.
Synthesis development environment: In this mode, new synthesis modules, intonation, waveform synthesizers, etc. can be developed and compared in a software environment that provides the right basic tools so that development may concentrate on the theory not the implementation.

Importantly what makes this approach worthwhile is that because development happens within the same basic system as applications actually use. there is a direct route from research to use.

The above three schemes may be mixed and many people will use Festival at different levels. Each level roughly corresponds to a different programming language. In the multi-lingual tts mode work would most probably be done in a Unix shell or other API, integrating Festival into some larger application. In language mode, parameters would be specified in Scheme, Festival's scripting language. In developing new synthesis method, work would most likely be done in C++.

3.2 Core system

The core system consists of the following features.

Scheme-based scripting language: In order for easy specification of parameters and flow of control within the system a Scheme interpreter (a dialect of Lisp) is provided as a command interpreter. This means much of Festival's features are fully controllable at run time without having to re-compile the system.
C++/C core modules: The core modules are written in C++ or C and are easily interfaced to the Scheme interpreter. This offers both the advantages of a fast efficient language and the flexibility of an interpreted system.
General utterance representation: In C++ classes are used to offer a flexible powerful representation for utterances. This makes writing functions using utterances easy and efficient. This is provided by the Edinburgh Speech Tools Library
Waveform I/O, formats, resampling: Many common waveform formats are cleanly supported so that waveforms, label files, coefficient files can easily be read and written. Resampling and changing formats is also supported making portability much easier.
Utterance, relations, features, I/O: The utterance structure gives a common regular form to all utterances. Full support for access through Scheme, and in C++ is made through simple to use functions. Utterances, or parts of utterances may be dumped to files in a human readable form for external manipulation and reloaded.
Standard data tools: A number of basic tools are available so you can easily use standard methods without having to build new tools. These include a Viterbi decoder, ngram support, regular expression (Regex) matching, linear regression support, CART support (though the CART builder `wagon'), weighted finite state transducers, and stochastic context free grammars.
Audio device access and spooling: The Edinburgh Speech Tools Library offers direct and indirect support for many types of output audio device. Also spooling is supported, allowing synthesis to continue while playing a file.
server/client model: A server client mode is provided for so that a larger more powerful machine might be used remotely by smaller programs saving on both start up time, and resources required on the client end.

3.3 Festival 1.4.1

The current version of Festival offers the following key features.

English, (British and American), and Spanish text to speech
Externally configurable language independent modules: phonesets, lexicons, intonation, part of speech, duration, diphone/unit selection, letter-to-sound rules, text modes.
On-line documentation: HTML and info. Also meta-h in the command interpreter will give help on the current symbol.
Example applications: saytime, latest news
Portable (Unix) distribution, (preliminary Windows NT/95 support)
Multiple APIs: STML, emacs, scripting, shell, client/server.

3.4 Festival uses

Festival offers a number of APIs for synthesis.

Unix shell:

festival --tts news.txt
echo "Hello world" | festival --tts

Emacs: Say menu: say region, buffer, select language, select voice
Interactive command interpreter Scheme based read-eval-print loop
C++ library adding modules in C++
client/server mode

3.5 Using Festival

This is the recommended method for using Festival as part of this course. One of the main objectives of this course is that at least you'll be able to make your computer talk using Festival, saying the things you wish. Of course it is also intended that you understand the processes involved in generating that speech and that you understand enough that you can influence these methods.

The documentation is on-line in GNU Emacs info mode and in HTML format at

http://www.speech.cs.cmu.edu/festival/manual-1.4.1

See the "Quick Start" chapter in the user manual.

Because much of this course involves writing, or more usually copying and modifying, small pieces of Scheme code and rules, the following mode of working is recommended.

Add any new rules, functions, parameter settings to a file and always name that file as an argument to Festival when you start it. For example create a file called `ex.scm' in some new directory. Add the following to it

;;; Functions. rules, parameters etc for festival course
;;;
(Parameter.set 'Duration_Stretch 2.0)

Start Festival as follows

festival ex.scm

Now when you synthesize anything it should be very slow

(SayText "This is a pen")

Remember to remove the Duration_Stretch command before doing the other exercises.

3.6 Exercises

These exercises are designed to work with Festival version 1.4.1. Some auxiliary programs and files are given as part of the course they will identified with respect to the COURSEDIR which will be installed on the system where you will be doing these exercises. Ask the course organiser for the actual pathname.

Make Festival say your name, (adding an entry to the lexicon if your name is not pronounced correctly).
Make Festival say the names of all people logged onto liddell (or some large central machine).
Install the Emacs interface. Select a piece of text in a buffer and get Festival to say it. Find ten things that Festival doesn't say properly. (we will try to fix those things later in the course).
How long does it take for Festival to say "Alice's Adventures in Wonderland"?

3.7 Hints

This section gives hints (and sometimes the full answers) to the exercises. There is a lot to learn in starting to use Festival so these hints are here to point you in the right direction. Also we provide things like shell scripts etc. that are not part of Festival itself, but help to complete the exercises.

This can be done from the shell
```
echo My name is ... | festival --tts
```
or within the command interpreter with the command
```
(SayText "My name is ...")
```
in the command interpreter. If your name is not pronounced properly you can add new entries to the lexicon using the the function lex.add.entry For example the default synthesizer pronounces Ronald Reagan's second name wrongly so we can redefine the pronunciation as
```
(lex.add.entry
 '("reagan" n (((r ei) 1) ((g a n) 0))))
```
To find out what the phoneme set is and possible formats, it is often useful to lookup similar words. Use the lex.lookup function as in
```
(lex.lookup 'reagan)
```
then copy the entry changing it as desired. To keep the pronunciation add it to your `.festivalrc' in your home directory. This file is automatically loaded every time you run Festival so then it will always know about your name. Because there are different lexicons for different languages/dialects you must first select the lexicon/voice first before setting the new pronunciation.
```
(voice_kal_diphone)
(lex.add.entry ...)
```
You'll need to get the list of names of the people who are logged on. You can do this in a number of different ways. One way is
```
#!/bin/sh
#
who | awk '{print $1}' | sort -u |
while read i
do
   cat /etc/passwd | grep "^"$i":" | awk -F: '{print $5}' | sed 's/,.*$//'
done 
```
This program is also in `COURSEDIR/bin/users-logged-on'. You may want to add to this list with some introduction like "The people currently logged on to liddell are:" and you may need to add lexical entries for some names.
Read the chapter on the Emacs interface in the Festival manual. Change your `.emacs' accordingly. `festival.el' is already installed in an directory accessible to Emacs in CSTR.
To save you typing the whole book, an on-line copy of Lewis Carroll's story is in
```
ftp://uiarchive.cso.uiuc.edu/pub/etext/gutenberg/etext91/alice30.txt
```

Go to the first, previous, next, last section, table of contents.