中文语音语料-THCHS30_daisycolour

http://blog.sina.com.cn/u/2331051670

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

中文语音语料-THCHS30

(2018-06-12 09:24:27)

标签：

语音

语料

中文

thchs

分类： AI/ML

中文语音语料（13388个片段）

语料地址： http://www.openslr.org/18/

data_thchs30.tgz [6.4G] ( speech data and transcripts ) Mirrors: [China]

About this resource:

A Free Chinese Speech Corpus Released by CSLT@Tsinghua University.

THCHS30 is an open Chinese speech database published by Center for Speech and Language Technology (CSLT) at Tsinghua University. The origional recording was conducted in 2002 by Dong Wang, supervised by Prof. Xiaoyan Zhu, at the Key State Lab of Intelligence and System, Department of Computer Science, Tsinghua Universeity, and the original name was 'TCMSD', standing for 'Tsinghua Continuous Mandarin Speech Database'. The publication after 13 years has been initiated by Dr. Dong Wang and was supported by Prof. Xiaoyan Zhu. We hope to provide a toy database for new researchers in the field of speech recognition. Therefore, the database is totally free to academic users. You can cite the data using the following BibTeX entry:

@misc{THCHS30_2015,

title={THCHS-30 : A Free Chinese Speech Corpus},

author={Dong Wang, Xuewei Zhang, Zhiyong Zhang},

year={2015},

url={http://arxiv.org/abs/1512.01882}

}

参考：今天在清华大学cslt实验室王东老师的分享下，kaldi终于有了免费的中文语音识别的例子，

github网址。各位可以根据这个来训练自己的模型。

语料应用讨论： https://github.com/Rayhane-mamah/Tacotron-2/issues/18

语料paper: THCHS-30 : A Free Chinese Speech Corpus

首发时间： 2000 - 2001 [13][Dong Wang, Dalei Wu, and Xiaoyan Zhu, \TCMSD: a new chinese

continuous speech database," in International Conference on Chinese