[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19. Duration

A number of different duration prediction modules are available with varying levels of sophistication.

Segmental duration prediction is done by the module Duration which calls different actual methods depending on the parameter Duration_Method.

All of the following duration methods may be further affected by both a global duration stretch and a per word one.

If the parameter Duration_Stretch is set, all absolute durations predicted by any of the duration methods described here are multiplied by the parameter’s value. For example

(Parameter.set 'Duration_Stretch 1.2)

will make everything speak more slowly.

In addition to the global stretch method, if the feature dur_stretch on the related Token is set it will also be used as a multiplicative factor on the duration produced by the selected method. That is R:Syllable.parent.parent.R:Token.parent.dur_stretch. There is a lisp function duration_find_stretch wchi will return the combined gloabel and local duration stretch factor for a given segment item.

Note these global and local methods of affecting the duration produced by models are crude and should be considered hacks. Uniform modification of durations is not what happens in real speech. These parameters are typically used when the underlying duration method is lacking in some way. However these can be useful.

Note it is quite easy to implement new duration methods in Scheme directly.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.1 Default durations

If parameter Duration_Method is set to Default, the simplest duration model is used. All segments are 100 milliseconds (this can be modified by Duration_Stretch, and/or the localised Token related dur_stretch feature).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.2 Average durations

If parameter Duration_Method is set to Averages then segmental durations are set to their averages. The variable phoneme_durations should be an a-list of phones and averages in seconds. The file ‘lib/mrpa_durs.scm’ has an example for the mrpa phoneset.

If a segment is found that does not appear in the list a default duration of 0.1 seconds is assigned, and a warning message generated.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.3 Klatt durations

If parameter Duration_Method is set to Klatt the duration rules from the Klatt book (allen87, chapter 9). This method requires minimum and inherent durations for each phoneme in the phoneset. This information is held in the variable duration_klatt_params. Each member of this list is a three-tuple, of phone name, inherent duration and minimum duration. An example for the mrpa phoneset is in ‘lib/klatt_durs.scm’.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4 CART durations

Two very similar methods of duration prediction by CART tree are supported. The first, used when parameter Duration_Method is Tree simply predicts durations directly for each segment. The tree is set in the variable duration_cart_tree.

The second, which seems to give better results, is used when parameter Duration_Method is Tree_ZScores. In this second model the tree predicts zscores (number of standard deviations from the mean) rather than duration directly. (This follows campbell91, but we don’t deal in syllable durations here.) This method requires means and standard deviations for each phone. The variable duration_cart_tree should contain the zscore prediction tree and the variable duration_ph_info should contain a list of phone, mean duration, and standard deviation for each phone in the phoneset.

An example tree trained from 460 sentences spoken by Gordon is in ‘lib/gswdurtreeZ’. Phone means and standard deviations are in ‘lib/gsw_durs.scm’.

After prediction the segmental duration is calculated by the simple formula

duration = mean + (zscore * standard deviation)

For some other duration models that affect an inherent duration by some factor this method has been used. If the tree predicts factors rather than zscores and the duration_ph_info entries are phone, 0.0, inherent duration. The above formula will generate the desired result. Klatt and Klatt-like rules can be implemented in the this way without adding a new method.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Alan W Black on December 2, 2014 using texi2html 1.82.