Building Synthetic Voices
<<< Previous	Text analysis	Next >>>

Mark-up modes

In some situtation it ispossible for the user of a text-to-speech system to provide more information for the synthesizer that just the text, or the type of text. It is near impossible for TTS engines to get everything right all of the time, so in such situation it is useful to offer the developer a method to help guide the synthesizer in its syntehsis process.

Most speech synthesizer offer some speech method or embedded commands but these are specific to one interface or one API. For example the Microsoft SAPI interface allows various commands to be embedded in a text string some examples .

However there has been a move more recently to offer a general mark up method that is more general. A number of people saw the potential use of XML as a general method for marking up text for speech synthesis. The earliest method we know was in a Masters thesis at Edinburgh in 1995 [isard]. This was later published under the name SSML. A number of other groups were alos looking at this and a large consortium formed to define this further under various names STML, and eventually Sable.

Around the same time, more serious definitions of such a mark-up were being developed. The first to reach a well-define stage was JSML, (Java Speech Mark-up Language), which covered aspects of speech recognition and grammars as well as speech synthesis mark-up. Unlike any of the other XML based markup languages, JSML, as it was embedded within Java, could define exceptions in a reasonable way. One of the problems iwth a simpel XML markup is that it is one way. You can request a voice or a language or some functionality, but there is no mechanism for feedback to know if such a feature is actually available.

XML markup for speech have been further advances with VoiceXML, which defines a mark-up language for basic dialog systems. The speech synthesi part of the VoiceXML is closely follows the functionality of JSML and its predecessors.

A new standard for markup for speech synthesis is currently being defined by W3C under the name SSML, confusingly the same name as the earliest example, but not designed to be compatible with the original, but take into account the functionaly and desires of users of TTS. SSML markup is also defined as the method for speech synthesis markup in Microsoft's SALT tags.

%%%%%%%%%%%%%%%%%%%%%%
Discussion to be added
%%%%%%%%%%%%%%%%%%%%%%

<<< Previous	Home	Next >>>
TTS modes	Up	Lexicons