计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (19): 62-67.DOI: 10.3778/j.issn.1002-8331.1708-0392

• 理论与研发 • 上一篇    下一篇

侗台语族语言的编辑距离分类

赵志靖1,江  荻2   

  1. 1.扬州大学,江苏 扬州 225009
    2.中国社会科学院,北京 100081
  • 出版日期:2018-10-01 发布日期:2018-10-19

Classification of levenshtein distance of Dong-Tai language family languages

ZHAO Zhijing1,JIANG Di2   

  1. 1.Yangzhou University, Yangzhou, Jiangsu 225009, China
    2.Chinese Academy of Social Sciences, Beijing 100081, China
  • Online:2018-10-01 Published:2018-10-19

摘要: 编辑距离是一种距离测量法,源于将一个字符串变换为另一个字符串所需要的编辑操作数,该方法能够自动将语言进行分类,最近这些年在西方很受关注,被证明测量语言或方言间距离是有效的。运用编辑距离算法对侗台语族语言做出计量分类以及亲缘关系程度的描述。结果表明编辑距离分类结果与历史语言学的分类结果是基本一致的,为计量法提供了新思路。编辑距离可以应用于东亚语言的研究中。

关键词: 侗台语族, 编辑距离, 语言分类

Abstract: The levenshtein distance is a distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention in Western countries as a means of automatically classifying languages into genealogical subgroups, and has been proved to be effective in the measurement of the distances between languages or dialects. This paper applies the algorithm of the levenshtein distance to the computational classification of the Dong-Tai language family languages, and their genetic relationship is described. The calculation results show that the language classification of the levenshtein distance is consistent with that of the historical linguistics, and a new way is proposed for the computational method. The levenshtein distance can be applied to the research of the East Asian languages.

Key words: Dong-Tai language family, levenshtein distance, language classification