A number of different duration prediction modules are available with varying levels of sophistication.
Segmental duration prediction is done by the module Duration
which calls different actual methods depending on the parameter
Duration_Method
.
All of the following duration methods may be further affected by both a global duration stretch and a per word one.
If the parameter Duration_Stretch
is set, all absolute durations
predicted by any of the duration methods described here are multiplied by
the parameter's value. For example
(Parameter.set 'Duration_Stretch 1.2)
will make everything speak more slowly.
In addition to the global stretch method, if the feature
dur_stretch
on the related Token
is set it will also be
used as a multiplicative factor on the duration produced by the selected
method. That is R:Syllable.parent.parent.R:Token.parent.dur_stretch
.
There is a lisp function duration_find_stretch
wchi will return
the combined gloabel and local duration stretch factor for a given
segment item.
Note these global and local methods of affecting the duration produced by models are crude and should be considered hacks. Uniform modification of durations is not what happens in real speech. These parameters are typically used when the underlying duration method is lacking in some way. However these can be useful.
Note it is quite easy to implement new duration methods in Scheme directly.
If parameter Duration_Method
is set to Default
, the
simplest duration model is used. All segments are 100 milliseconds
(this can be modified by Duration_Stretch
, and/or the localised
Token related dur_stretch
feature).
If parameter Duration_Method
is set to Averages
then segmental durations are set to their averages. The variable
phoneme_durations
should be an a-list of phones and averages
in seconds. The file `lib/mrpa_durs.scm' has an example for
the mrpa phoneset.
If a segment is found that does not appear in the list a default duration of 0.1 seconds is assigned, and a warning message generated.
If parameter Duration_Method
is set to Klatt
the duration
rules from the Klatt book (allen87, chapter 9). This method
requires minimum and inherent durations for each phoneme in the
phoneset. This information is held in the variable
duration_klatt_params
. Each member of this list is a
three-tuple, of phone name, inherent duration and minimum duration. An
example for the mrpa phoneset is in `lib/klatt_durs.scm'.
Two very similar methods of duration prediction by CART tree
are supported. The first, used when parameter Duration_Method
is Tree
simply predicts durations directly for each segment.
The tree is set in the variable duration_cart_tree
.
The second, which seems to give better results, is used when parameter
Duration_Method
is Tree_ZScores
. In this second model the
tree predicts zscores (number of standard deviations from the mean)
rather than duration directly. (This follows campbell91, but we
don't deal in syllable durations here.) This method requires means and
standard deviations for each phone. The variable
duration_cart_tree
should contain the zscore prediction tree and
the variable duration_ph_info
should contain a list of phone,
mean duration, and standard deviation for each phone in the phoneset.
An example tree trained from 460 sentences spoken by Gordon is in `lib/gswdurtreeZ'. Phone means and standard deviations are in `lib/gsw_durs.scm'.
After prediction the segmental duration is calculated by the simple formula
duration = mean + (zscore * standard deviation)
For some other duration models that affect an inherent duration by
some factor this method has been used. If the tree predicts factors
rather than zscores and the duration_ph_info
entries
are phone, 0.0, inherent duration. The above formula will generate the
desired result. Klatt and Klatt-like rules can be implemented in the
this way without adding a new method.
Go to the first, previous, next, last section, table of contents.