next up previous
Next: Speech Database Up: Generating F0 contours for Previous: Background


A Tilt labelling for an utterance consists of an assignment of one of four basic intonational events: pitch accents, boundary tones, connections, and silence (labelled a, b, c, sil). Each of the events includes a number of continuous parameters. All events have a start parameter for the fundamental frequency at the start of the event (measured in Hertz). Pitch accents and boundary tones are also described by a duration (seconds), an absolute amplitude (Hertz), the peak position at which the rising portion of the event stops and the fall begins (measured in seconds from the start of the vowel), and a tilt value representing the ``tilt'' of the accent (described below). Figure 1 shows how the parameters relate to an example pitch accent.

The tilt parameter represents the amount of fall and rise in the accent. The starting F0 of an event acts as a point from which all other calculations may be made. The absolute amplitude from the starting F0 to the peak is the first portion of the absolute amplitude parameter. The other portion is the absolute amplitude from the peak to the end of the event. Either of these portions may be zero, if the event is a simple rise or simple fall. The two amplitude values are added together to form the absolute amplitude value. The tilt parameter is the difference of the amplitudes divided by their sum [10].


The tilt parameter has a range of -1 to 1, where -1 is pure fall, 1 is pure rise, and 0 contains equal portions of rise and fall.

Figure 1: Tilt parameters

Kurt Dusterhoff
Tue Jul 1 11:51:11 BST 1997