Computer Engineering and Applications ›› 2018, Vol. 54 ›› Issue (2): 234-238.DOI: 10.3778/j.issn.1002-8331.1608-0042

Previous Articles     Next Articles

Research and construction of endangered language spoken corpus——case study on Lizu

CAO Lei1, YIN Weibin2, SUN Qinyao1, WANG Zhi3, YU Chongchong1, LI Daowei1   

  1. 1.College of Computer & Information Engineering, Beijing Technology & Business University, Beijing 100048, China
    2.Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences, Beijing 100081, China
    3.College of History and Culture, Sichuan University, Chengdu 610064, China
  • Online:2018-01-15 Published:2018-01-31

濒危语言口语语料库的研究与构建——以吕苏语为范例

操  镭1,尹蔚彬2,孙沁瑶1,王  志3,于重重1,李道玮1   

  1. 1.北京工商大学 计算机与信息工程学院,北京 100048
    2.中国社会科学院 民族学与人类学研究所,北京 100081
    3.四川大学 历史文化学院,成都 610064

Abstract: The purpose of establishing an endangered language spoken corpus is to preserve the endangered language totally, especially its vitality and the local culture, for studying and researching. The preservation of endangered language spoken corpus includes original voice files, international phonetic alphabet annotation, Chinese translation annotation. The paper takes Lizu language as an example, and studies the establishment of endangered languages spoken corpus comprehensively and systematically. Besides automatic word segmentation and keyword extraction of annotation corpus is realized, which is provided for the establishment of universal endangered language corpus  subsequently as an example.

Key words: endangered language, spoken language corpus, Lizu language

摘要: 濒危语言口语语料库建立的目的是系统地保存近乎消失的濒危语言,留存濒危语言的生命力与地方文化,并且能够对其进行学习与研究。濒危语言口语语料库保存的内容主要包括原始声音文件、国际音标标注、汉语对译标注以及汉语翻译标注。以濒危语言吕苏语为范例,深入、全面、系统地研究与建立濒危语言口语语料库,并对标注语料实现了自动分词与关键词提取的功能,为后续建立通用濒危语言语料库提供了一个范例。

关键词: 濒危语言, 口语语料库, 吕苏语