Building Synthetic Voices | ||
---|---|---|
<<< Previous | Limited domain synthesis | Next >>> |
Once you decided on a set of utterances that appropriately cover the domain you also need to consider how those particular text strings are synthesized. For example if the data contains flight numbers, dates, times etc, you must ensure that festival properly renders those. As we are discussing a limited domain the distribution of token types will be different from standard text but also more constrained so simple changes to the lexicon, token to word rules, etc. will allow properly synthesis of these utterances.
One particular area of customization we have noted is worthwhile is that of phrasing. It seems important to explicitly mark phrasing in the prompts, and have the speaker follow such phrasing as it allows for much better joins in unit selection, as well as consist prosody from the speaker. Thus in the default code provided below the normal phrasing module in festival is replaced with one that treat punctuation as phrasal markers.
<<< Previous | Home | Next >>> |
Limited domain synthesis | Up | autolabeling issues |