Blizzard 2007
in conjunction with the
Sixth ISCA Workshop on Speech Synthesis
Bonn, Germany / August 25, 2007

ATRECSS -- ATR English Speech Corpus For Speech Synthesis

Jinfu Ni (1,2), Toshio Hirai (3), Hisashi Kawai (2,4), Tomoki Toda (1,5), Keiichi Tokuda (1,6), Minoru Tsuzaki (1,7), Shinsuke Sakai (1,2), Ranniery Maia (1,2), Satoshi Nakamura (1,2)

(1) National Institute of Information and Communications Technology, Japan
(2) ATR Spoken Language Communication Labs, Japan
(3) Arcadia Inc., Japan
(4) KDDI Research and Development Labs, Japan
(5) Nara Institute of Science and Technology, Japan
(6) Nagoya Institute of Technology, Japan
(7) Kyoto City University of Arts, Japan

This paper introduces a large-scale phonetically-balanced English speech corpus developed at ATR for corpus-based speech synthesis. This corpus includes a 16-hour American English speech data spoken by a professional male narrator in "reading style." The contents of prompt sentences concern basically news articles, travel conversations, and novels. The prompt sentences were selected from huge collections of texts using a greedy algorithm to maximize the coverage of linguistic units, such as diphones and triphones. A few measures were taken to control undesirable recording variations in voice quality in the short term (daily) and long term (monthly) while recording the prompt sentences. Statistical figures of the corpus developed as well as those of subsets provided for Blizzard Challenge 2006 and 2007 are presented.

