Next: Recording in multiple styles Up: Unit Selection and Emotional Previous: Emphasis

Style

In our work on providing speech synthesizer for applications we have found that the wider notions of emotion are rarely requested. However particular styles have often been required for the applications we have worked with.

In our work in providing voices for the AAC market (Augmentative and Alternative Communication) where people use hand held devices to speak having lost (or never had) the ability to speak for themselves, style is very important as the synthetic voice becomes the persons own voice. Synthesizers based on news reader style speech such as the Boston University Radio Corpus [11], produce voice output that still sounds like a news reader. An AAC device is primarily used for dialog, rather than extended monologues therefore we took this into account both in instruction to the voice talent while recording, and in the design of the utterances to record.

Delivery style is crucial in voice recording. In the recording of canned prompts, it is said that the most common phrase said by the voice coach is ``Say it again with a smile.'' Like the Genki vs News style weather described above style in delivery defines the style of the synthesizer. Putting people in a small recording studio for hours on end and getting them to read thousands of sentences may be one reason why synthesizers often sound bored.

In the recent DARPA-funded Babylon project where we were part of a team to developed a two-way speech-to-speech translation system running on a standard PDA. Our Speechalator system offers English to Arabic and Arabic-to-English in the medical interview domain.

Apart from the non-trivial problems of running on such a limited platform, such systems require the voice output style to be appropriate for the message being delivered.

The first issue in style in speech-to-speech translation is that some utterances are commands, such as ``Put down your weapons'' while others should be delivered in a more compassionate style, such as ``Where does it hurt?''. Inappropriate style for either of these utterances will be detrimental to communication. On an earlier speech-to-speech system developed by us [12], we did not take such care and the delivery of commands in the Croatian synthesizer were consider somewhat amusing by native speakers rather than as actual commands.

Next: Recording in multiple styles Up: Unit Selection and Emotional Previous: Emphasis

Alan W Black 2003-09-07