清华大学CSLT发布的语音数据集
A Free Chinese Speech Corpus Released by CSLT@Tsinghua University
包含来自855个发声人的102600条发音(总计长达500小时)的朗读数据集
A free Chinese Mandarin corpus by Surfingtech (www.surfing.ai), containing utterances from 855 speakers, 102600 utterances;
包含长达100小时的语音数据集(Shanghai Primewords)
Chinese Mandarin corpus released by Shanghai Primewords Co. Ltd. (www.primewords.cn), containing 100 hours of speech data.
包含来自600个发声人的总计长达200小时的朗读数据集(商汤科技)
A Chinese Mandarin speech corpus by Beijing DataTang Technology Co., Ltd, containing 200 hours of speech data from 600 speakers. The transcription accuracy for each sentence is larger than 98%.
包含来自1080个发声人的总计长达755小时的朗读数据集(魔算数据)
The corpus by Magic Data Technology Co., Ltd. , containing 755 hours of scripted read speech data from 1080 native speakers of the Mandarin Chinese spoken in mainland China. The sentence transcription accuracy is higher than 98%.
中文发音人识别数据集
A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University
中文热词检测数据集
Chinese hotwords detection dataset, provided by Mobvoi CO.,LTD