[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter covers aspects of programming within the Festival environment, creating new modules, and modifying existing ones. It describes basic Classes available and gives some particular examples of things you may wish to add.
27.1 The source code | A walkthrough of the source code | |
27.2 Writing a new module | Example access of an utterance |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The ultimate authority on what happens in the system lies in the source code itself. No matter how hard we try, and how automatic we make it, the source code will always be ahead of the documentation. Thus if you are going to be using Festival in a serious way, familiarity with the source is essential.
The lowest level functions are catered for in the Edinburgh Speech Tools, a separate library distributed with Festival. The Edinburgh Speech Tool Library offers the basic utterance structure, waveform file access, and other various useful low-level functions which we share between different speech systems in our work. See (speechtools)Top section ‘Overview’ in Edinburgh Speech Tools Library Manual.
The directory structure for the Festival distribution reflects the conceptual split in the code.
The user-level executable binaries and scripts that are part of the
festival system. These are simple symbolic links to the binaries
or if the system is compiled with shared libraries small wrap-around
shell scripts that set LD_LIBRARY_PATH
appropriately
This contains the texinfo documentation for the whole system. The
‘Makefile’ constructs the info and/or html version as desired.
Note that the festival
binary itself is used to generate the lists
of functions and variables used within the system, so must be compiled
and in place to generate a new version of the documentation.
This contains various examples. Some are explained within this manual, others are there just as examples.
The basic Scheme parts of the system, including ‘init.scm’ the
first file loaded by festival
at start-up time. Depending on
your installation, this directory may also contain subdirectories
containing lexicons, voices and databases. This directory and its
sub-directories are used by Festival at run-time.
Executables for Festival’s internal use. A subdirectory containing at least the audio spooler will be automatically created (one for each different architecture the system is compiled on). Scripts are added to this top level directory itself.
By default this contains the voices used by Festival including their basic Scheme set up functions as well as the diphone databases.
This contains various lexicon files distributed as part of the system.
This contains the basic ‘Makefile’ configuration files for compiling the system (run-time configuration is handled by Scheme in the ‘lib/’ directory). The file ‘config/config’ created as a copy of the standard ‘config/config-dist’ is the installation specific configuration. In most cases a simpel copy of the distribution file will be sufficient.
The main C++/C source for the system.
Where the ‘libFestival.a’ is built.
Where include files shared between various parts of the system live. The file ‘festival.h’ provides access to most of the parts of the system.
Contains the top level C++ files for the actual executables. This is directory where the executable binary ‘festival’ is created.
The main core of the Festival system. At present everything is held in a single sub-directory ‘./src/arc/festival/’. This contains the basic core of the synthesis system itself. This directory contains lisp front ends to access the core utterance architecture, and phonesets, basic tools like, client/server support, ngram support, etc, and an audio spooler.
In contrast to the ‘arch/’ directory this contains the non-core parts of the system. A set of basic example modules are included with the standard distribution. These are the parts that do the synthesis, the other parts are just there to make module writing easier.
This contains some basic simple modules that weren’t quite big enough
to deserve their own directory. Most importantly it includes the
Initialize
module called by many synthesis methods which sets
up an utterance structure and loads in initial values. This directory
also contains phrasing, part of speech, and word (syllable and phone
construction from words) modules.
This is not really a module in the true sense (the Word
module
is the main user of this). This contains functions to construct, compile,
and access lexicons (entries of words, part of speech and
pronunciations). This also contains a letter-to-sound rule system.
This contains various intonation systems, from the very simple to quite complex parameter driven intonation systems.
This contains various duration prediction systems, from the very simple (fixed duration) to quite complex parameter driven duration systems.
A basic diphone synthesizer system, supporting a simple database format (which can be grouped into a more efficient binary representation). It is multi-lingual, and allows multiple databases to be loaded at once. It offers a choice of concatenation methods for diphones: residual excited LPC or PSOLA (TM) (which is not distributed)
Various text analysis functions, particularly the tokenizer and utterance segmenter (from arbitrary files). This directory also contains the support for text modes and SGML.
An LPC based diphone synthesizer. Very small and neat.
The Festival/Scheme front end to An XML parser written by Richard Tobin from University of Edinburgh’s Language Technology Group.. rxp is now part of the speech tools rather than just Festival.
A simple interface the the Stochastic Context Free Grammar parser in the speech tools library.
An optional module contain the previouslty used diphone synthsizer.
A partial implementation of a cluster unit selection algorithm as described in black97c.
This consist of a new set of modules for doing waveform synthesis. They are inteneded to unit size independent (e.g. diphone, phone, non-uniform unit). Also selection, prosodic modification, joining and signal processing are separately defined. Unfortunately this code has not really been exercised enough to be considered stable to be used in the default synthesis method, but those working on new synthesis techniques may be interested in integration using these new modules. They may be updated before the next full release of Festival.
Other optional directories may be contained here containing various research modules not yet part of the standard distribution. See below for descriptions of how to add modules to the basic system.
One intended use of Festival is offer a software system where new modules may be easily tested in a stable environment. We have tried to make the addition of new modules easy, without requiring complex modifications to the rest of the system.
All of the basic modules should really be considered merely as example modules. Without much effort all of them could be improved.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section gives a simple example of writing a new module. showing the basic steps that must be done to create and add a new module that is available for the rest of the system to use. Note many things can be done solely in Scheme now and really only low-level very intensive things (like waveform synthesizers) need be coded in C++.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The example here is a duration module which sets durations of phones for a given list of averages. To make this example more interesting, all durations in accented syllables are increased by 1.5. Note that this is just an example for the sake of one, this (and much better techniques) could easily done within the system as it is at present using a hand-crafted CART tree.
Our knew module, called Duration_Simple
can most easily
be added to the ‘./src/Duration/’ directory in a file
‘simdur.cc’. You can worry about the copyright notice, but
after that you’ll probably need the following includes
#include <festival.h> |
The module itself must be declared in a fixed form. That is receiving a single LISP form (an utterance) as an argument and returning that LISP form at the end. Thus our definition will start
LISP FT_Duration_Simple(LISP utt) { |
Next we need to declare an utterance structure and extract it from the LISP form. We also make a few other variable declarations
EST_Utterance *u = get_c_utt(utt); EST_Item *s; float end=0.0, dur; LISP ph_avgs,ldur; |
We cannot list the average durations for each phone in the source code as we cannot tell which phoneset we are using (or what modifications we want to make to durations between speakers). Therefore the phone and average duration information is held in a Scheme variable for easy setting at run time. To use the information in our C++ domain we must get that value from the Scheme domain. This is done with the following statement.
ph_avgs = siod_get_lval("phoneme_averages","no phoneme durations"); |
The first argument to siod_get_lval
is the Scheme name of
a variable which has been set to an assoc list of phone and average
duration before this module is called. See the
variable phone_durations
in ‘lib/mrpa_durs.scm’ for
the format. The second argument to siod_get_lval
. is an
error message to be printed if the variable phone_averages
is not set. If the second argument to siod_get_lval
is
NULL
then no error is given and if the variable is unset
this function simply returns the Scheme value nil
.
Now that we have the duration data we can go through each segment in the utterance and add the duration. The loop looks like
for (s=u->relation("Segment")->head(); s != 0; s = next(s)) { |
We can lookup the average duration of the current segment name
using the function siod_assoc_str
. As arguments, it
takes the segment name s->name()
and the assoc list of
phones and duration.
ldur = siod_assoc_str(s->name(),ph_avgs); |
Note the return value is actually a LISP pair (phone name and duration),
or nil
if the phone isn’t in the list. Here we check if
the segment is in the list. If it is not we print an error and set
the duration to 100 ms, if it is in the list the floating point number
is extracted from the LISP pair.
if (ldur == NIL) { cerr << "Phoneme: " << s->name() << " no duration " << endl; dur = 0.100; } else dur = get_c_float(car(cdr(ldur))); |
If this phone is in an accented syllable we wish to increase its duration by a factor of 1.5. To find out if it is accented we use the feature system to find the syllable this phone is part of and find out if that syllable is accented.
if (ffeature(s,"R:SylStructure.parent.accented") == 1) dur *= 1.5; |
Now that we have the desired duration we increment the end
duration with our predicted duration for this segment and set
the end of the current segment.
end += dur; s->fset("end",end); } |
Finally we return the utterance from the function.
return utt; } |
Once a module is defined it must be declared to the system so it may be
called. To do this one must call the function
festival_def_utt_module
which takes a LISP name, the C++ function
name and a documentation string describing what the module does. This
will automatically be available at run-time and added to the manual.
The call to this function should be added to the initialization function
in the directory you are adding the module too. The function is called
festival_DIRNAME_init()
. If one doesn’t exist you’ll need to
create it.
In ‘./src/Duration/’ the function festival_Duration_init()
is at the end of the file ‘dur_aux.cc’. Thus we can add our
new modules declaration at the end of that function. But first
we must declare the C++ function in that file. Thus above
that function we would add
LISP FT_Duration_Simple(LISP args); |
While at the end of the function festival_Duration_init()
we would add
festival_def_utt_module("Duration_Simple",FT_Duration_Simple, "(Duration_Simple UTT)\n\ Label all segments with average duration ... "); |
In order for our new file to be compiled we must add it
to the ‘Makefile’ in that directory, to the SRCS
variable.
Then when we type make
in ‘./src/’ our new module
will be properly linked in and available for use.
Of course we are not quite finished. We still have to say when our new duration module should be called. When we set
(Parameter.set 'Duration_Method Duration_Simple) |
for a voice it will use our new module, calls to the function
utt.synth
will use our new duration module.
Note in earlier versions of Festival it was necessary to modify the duration calling function in ‘lib/duration.scm’ but that is no longer necessary.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this example we will make more direct use of the utterance structure, showing the gory details of following relations in an utterance. This time we will create a module that will name all syllables with a concatenation of the names of the segments they are related to.
As before we need the same standard includes
#include "festival.h" |
Now the definition the function
LISP FT_Name_Syls(LISP utt) { |
As with the previous example we are called with an utterance LISP object and will return the same. The first task is to extract the utterance object from the LISP object.
EST_Utterance *u = get_c_utt(utt); EST_Item *syl,*seg; |
Now for each syllable in the utterance we want to find which segments are related to it.
for (syl=u->relation("Syllable")->head(); syl != 0; syl = next(syl)) { |
Here we declare a variable to cummulate the names of the segments.
EST_String sylname = ""; |
Now we iterate through the SylStructure
daughters of the
syllable. These will be the segments in that syllable.
for (seg=daughter1(syl,"SylStructure"); seg; seg=next(seg)) sylname += seg->name(); |
Finally we set the syllables name to the concatenative name, and loop to the next syllable.
syl->set_name(sylname); } |
Finally we return the LISP form of the utterance.
return utt; } |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In this example we will add a whole new subsystem. This will often be a common way for people to use Festival. For example let us assume we wish to add a formant waveform synthesizer (e.g like that in the free ‘rsynth’ program). In this case we will add a whole new sub-directory to the modules directory. Let us call it ‘rsynth/’.
In the directory we need a ‘Makefile’ of the standard form so we should copy one from one of the other directories, e.g. ‘Intonation/’. Standard methods are used to identify the source code files in a ‘Makefile’ so that the ‘.o’ files are properly added to the library. Following the other examples will ensure your code is integrated properly.
We’ll just skip over the bit where you extract the information from the utterance structure and synthesize the waveform (see ‘donovan/donovan.cc’ or ‘diphone/diphone.cc’ for examples).
To get Festival to use your new module you must tell it to compile the directory’s contents. This is done in ‘festival/config/config’. Add the line
ALSO_INCLUDE += rsynth |
to the end of that file (there are simialr ones mentioned). Simply adding the name of the directory here will add that as a new module and the directory will be compiled.
What you must provide in your code is a function
festival_DIRNAME_init()
which will be called at initialization
time. In this function you should call any further initialization
require and define and new Lisp functions you with to made available
to the rest of the system. For example in the ‘rsynth’
case we would define in some file in ‘rsynth/’
#include "festival.h" static LISP utt_rtsynth(LISP utt) { EST_Utterance *u = get_c_utt(utt); // Do format synthesis return utt; } void festival_rsynth_init() { proclaim_module("rsynth"); festival_def_utt_module("Rsynth_Synth",utt_rsynth, "(Rsynth_Synth UTT) A simple formant synthesizer"); ... } |
Integration of the code in optional (and standard directories) is done
by automatically creating ‘src/modules/init_modules.cc’ for the
list of standard directories plus those defined as
ALSO_INCLUDE
. A call to a function called
festival_DIRNAME_init()
will be made.
This mechanism is specifically designed so you can add modules to the system without changing anything in the standard distribution.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This third example shows you how to add a new Object to Scheme and add wraparounds to allow manipulation within the Scheme (and C++) domain.
Like example 2 we are assuming this is done in a new directory.
Suppose you have a new object called Widget
that can
transduce a string into some other string (with some optional
continuous parameter). Thus, here we create a new file ‘widget.cc’
like this
#include "festival.h" #include "widget.h" // definitions for the widget class |
In order to register the widgets as Lisp objects we actually
need to register them as EST_Val
’s as well. Thus we now need
VAL_REGISTER_CLASS(widget,Widget) SIOD_REGISTER_CLASS(widget,Widget) |
The first names given to these functions should be a short mnenomic name for the object that will be used in the defining of a set of access and construction functions. It of course must be unique within the whole system. The second name is the name of the object itself.
To understand its usage we can add a few simple widget manipulation functions
LISP widget_load(LISP filename) { EST_String fname = get_c_string(filename); Widget *w = new Widget; // build a new widget if (w->load(fname) == 0) // successful load return siod(w); else { cerr << "widget load: failed to load \"" << fname << "\"" << endl; festival_error(); } return NIL; // for compilers that get confused } |
Note that the function siod
constructs a LISP object from
a widget
, the class register macro defines that for you.
Also note that when giving an object to a LISP
object it then
owns the object and is responsible for deleting it when garbage
collection occurs on that LISP
object. Care should be
taken that you don’t put the same object within different LISP
objects. The macros VAL_RESGISTER_CLASS_NODEL
should be
called if you do not want your given object to be deleted by the LISP
system (this may cause leaks).
If you want refer to these functions in other files within your models you can use
VAL_REGISTER_CLASS_DCLS(widget,Widget) SIOD_REGISTER_CLASS_DCLS(widget,Widget) |
in a common ‘.h’ file
The following defines a function that takes a LISP object containing a widget, applies some method and returns a string.
LISP widget_apply(LISP lwidget, LISP string, LISP param) { Widget *w = widget(lwidget); EST_String s = get_c_string(string); float p = get_c_float(param); EST_String answer; answer = w->apply(s,p); return strintern(answer); } |
The function widget
, defined by the registration macros, takes
a LISP
object and returns a pointer to the widget
inside
it. If the LISP
object does not contain a widget
an
error will be thrown.
Finally you wish to add these functions to the Lisp system
void festival_widget_init() { init_subr_1("widget.load",widget_load, "(widget.load FILENAME)\n\ Load in widget from FILENAME."); init_subr_3("widget.apply",widget_apply, "(widget.apply WIDGET INPUT VAL)\n\ Returns widget applied to string iNPUT with float VAL."); } |
In your ‘Makefile’ for this directory you’ll need to add
the include directory where ‘widget.h’ is, if it is not
contained within the directory itself. This is done through
the make variable LOCAL_INCLUDES
as
LOCAL_INCLUDES = -I/usr/local/widget/include |
And for the linker you’ll need to identify where your widget library is. In your ‘festival/config/config’ file at the end add
COMPILERLIBS += -L/usr/local/widget/lib -lwidget |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Alan W Black on December 2, 2014 using texi2html 1.82.