next up previous
Next: Discussion Up: Unit Size in Unit Previous: Unit Size

Experiment

In order to investigate the optimal unit size we built synthesizers under four different conditions: syllable, diphone, phone and half phone.

The phone synthesizer, the base case, was built with the phone set, letter to sound rules and syllabification rules defined for Indian language.

To build the diphone synthesizer we tagged each phone with its preceding phone, thus units were still actually one phone in length but they are sub-typed based on their previous phone.

For the syllable based synthesizer, we treated the 2344 distinct syllables in the database as "phones" and listed them in our phoneset. These syllable-sized phones were assigned phonetic features based on their combined consonant and vowel part, with the consonant in onset given more preference over the consonant in coda. Thus the units in the inventory became full syllables rather than traditional phonemes. The lexicon parser was appropriately modified to generate these syllable-based phones rather than traditional phone names.

In implementing half phone synthesizer, each vowel was represented by two half phones, while the consonants were full phones. Two phone symbols were defined for each vowel in the phoneset, for example vowel /a/ was represented by /a_1/ and /a_2/. Labels at half phone level were derived by equally dividing the vowel segment into two half phones. The lexicon parser was also modified accordingly, to generate appropriate phone strings.

For perceptual evaluation of these synthesizers, we selected a set of 24 sentences from a Hindi news bulletin. The content of this bulletin was mostly about the political affairs of the world in the middle of March 2003. The syllables and diphones present in these 24 sentences were covered in the corresponding synthesizers. These sentences were synthesized by phone, diphone, syllable and half phone synthesizers and were subjected to the perceptual test of native Hindi speakers. The people who participated in these perceptual tests were working persons and graduate students and none of them had any experience in speech synthesis. Each listener was subjected to AB-test i.e the same sentence synthesized by two different synthesizers was played in random order and the listener was asked to decide which one sounded better for him/her. They also had the choice of giving the decision of equality.

The results of AB-test conducted on 11 persons in the case of syllable and diphone synthesizers and on 5 persons for the rest of the synthesizers are shown in Tables 1-6, with a summary in Table 7. Each row in these tables indicates the evaluation results of a native speaker. An entry such as $ 8$ $ 6$ $ 10$ in the first row of Table 1 indicates that the listener rated 8 utterances in favor of syllable, 6 utterances in favor of phone and 10 utterances as equally good or bad. The last row in each of these tables summarizes the results present in the corresponding tables.

Table 1: AB Test: Syllable Vs Phone
  Listener Preference
Test No. Syllable Phone No Preference
1. 8 6 10
2. 5 4 15
3. 9 - 15
4. 9 9 6
5. 9 7 8
  40 26 54

Table 2: AB Test: Syllable Vs Halfphone
  Listener Preference
Test No. Syllable Halfphone No Preference
1. 2 4 18
2. 9 3 12
3. 10 6 8
4. 4 - 20
5. 3 4 17
  28 17 75

Table 3: AB Test: Syllable Vs Diphone
  Listener Preference
Test No. Syllable Diphone No Preference
1. 13 8 3
2. 7 2 15
3. 4 4 16
4. 8 5 11
5. 11 6 7
6. 13 5 6
7. 10 8 6
8. 11 8 5
9. 11 6 7
10. 14 1 9
11. 12 12 -
  114 65 85

Table 4: AB Test: Diphone Vs Phone
  Listener Preference
Test No. Diphone Phone No Preference
1. 7 8 9
2. 4 4 16
3. 3 4 17
4. 8 6 10
5. 13 6 5
  35 28 57

Table 5: AB Test: Diphone Vs Halfphone
  Listener Preference
Test No. Diphone Halfphone No Preference
1. 6 5 13
2. 5 7 12
3. 11 5 8
4. 1 5 18
5. - 7 17
  23 29 68

Table 6: AB Test: Phone Vs Halfphone
  Listener Preference
Test No. Phone Halfphone No Preference
1. 5 3 16
2. 5 6 13
3. 7 8 9
4. 2 - 22
5. 1 5 18
  20 22 78


Table 7: Summary of AB Test (scores are represented in %)
Rank Syl vs Diph Syl vs Ph Syl vs Halfph Diph vs Ph Diph vs Halfph Ph vs HalfPh
I syl 43% = 45% = 63 % = 47% = 57% = 65%
II = 32% syl 33% syl 23 % diph 29% halfph 24% halfph 18%
III diph 24% ph 21% halfph 14% ph 23% diph 19% ph 17%
Sum. syl syl syl diph halfph halfph


next up previous
Next: Discussion Up: Unit Size in Unit Previous: Unit Size
Alan W Black 2003-10-20