Formal Evaluation Tests

Once you yourself and your immediate colleages have tests the voice you will want more formal evaluation metrics. Again we are looking at diagnositic evluation, comparative eveluation between different commercial synthesizers is quite a different task.

In our English checks we used Wall Street Journal and Time magazine articles (around 10 millions words in total). Many unusual words apear only in one article (e.g proper names) which are less important to add to the lexicon, but unusual words that appear across articales are more likely to appear again so should be added.

Be aware that using data will cause your coverage to be biased towards that type of data. Our databases are mostly collected in the early 90s and hence have good coverage for the Gulf War, and the changes in Eastern Europe but our ten million words have no occurences of the words "Sojourner" or "Lewinski" whcih only appear in stories later in the decade.

A script is provided in src/general/find_unknowns which will analyze given text to find which words do not appear in the current lexicon. You should use the -eval option to specify the selection of your voice. Note this checks to see which words are not in the lexicon itself, it replaces what ever letter-to-sound/ unknown word function you specified and saves any words for which that function is called in the given output file. For example

find_unknowns -eval '(voice_ked_diphone)' -output cmudict.unknown \

Normally you would run this over your database then cummulate the unknown words, then rerun the unknown words synthesizing each and listening to them to evaluate if your LTS system produces reasonable results. Fur those words which do have acceptable pronunciations add them to your lexicon.

Sematically unpredictable sentences

One technique that has been used to evaluation speech synthesis quality is testing against semantically unpredictable sentences.

Discussion to be added