Overview
July 31st-August 4th, 2000, at Carnegie Mellon Univeristy in
Pittsburgh, Pennsylvannia.
This 5-day course will allow attendees to gain practice in building
synthetic voices for speech applications. This course is aimed as
developers of speech systems who wish to take better advantage
of start of the art synthesis techniques.
The number of attendees to this course will be limited. Course fees
will be $2,500 per person. Accommodation and meals are extra.
To register please contact
Kevin A. Lenzo (lenzo@cs.cmu.edu)
Attendees of the course will:
- gain understanding the basic components of state-of-the-art
speech synthesis technology, and their relative complexity.
- gain practical experience in building new voices and trade-offs
between general TTS (diphone, general unit selection) synthesizers
and targeted and limited domain synthesizers
- gain practical experience in tailoring voices and their
applications to get the best compromise of quality, speed and
ease of construction
This course is based on
the
FestVox Document and uses Edinburgh University's
Festival Speech Synthesis System in all practicals. As these
tools are free for commercial use, techniques learned in this course may
be applied to building new commercial voices without further
licencing.
Day one
|
Basic synthesis techniques, history and future
The Festival Speech Synthesis System and its usage
Overview of building process and simple example
|
Day two
|
Key synthesis components: text analysis, lexicons,
linguistic processing and waveform synthesis
Customizing TTS for applications: tts_modes, markup etc
|
Day three
|
Building new diphone voices
Recording, labelling and corrections
A larger example
|
Day four
|
Unit selection synthesis
Building limited domain voices
|
Day five
|
Tuning, testing and correcting voices
Future techniques
|
The course will involve lectures and practicals including complete
walkthroughs for building your own voices so the attendees can
fully gain experience in actually building voices. Each attendee
will be given access to a computer during the course. Voices
built using these techniques can be run on any platform Festival
supports, which includes, Linux, Solaris, Windows etc. However
the course will use Linux workstations to demonstrate the use of
the voice building tools.
Presenters
The course will be taught and supervised by Alan W Black and Kevin A. Lenzo.
Alan W Black is a
principal author of the Festival
Speech Synthesis System and has had many years experience in
designing and building various speech synthesis systems in voices
in various languagues, both in academia and industry. Previous to
that he has worked on a wide range of speech and language research
projects all leading to practical implementations.
Kevin A. Lenzo. also
has many years experience in synthesis systems, both in academia
and industry, working in a wide range of synthesis systems,
including the development of PhoneBox, a multi-lingual unit
selection synthesizer. He is also a respected member of Perl
Programming Language community, active in the open source
lincencing movement and currently steward of the CMU Sphinx open source speech
recognition project.
Both are committed to open source, and the transfer of technology
from research to practical applications. Together they have
authored the FestVox Document,
been part of building many voices in a number of
different languages, and developed the technology to make the
building of synthetic voices, better, more reliable, and available
to a much larger community.
|