Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (10): 161-166.

Previous Articles     Next Articles

Chinese dialect identification based on combination diverse Density

GU Mingliang1,2, ZHANG Shixing1, ZHANG Hao1, ZHANG Ning1   

  1. 1.School of Physics & Electronic Engineering, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
    2.School of Linguistic Science, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
  • Online:2016-05-15 Published:2016-05-16

基于联合多样性密度的汉语方言辨识

顾明亮1,2,张世形1,张  浩1,张  宁1   

  1. 1.江苏师范大学 物理与电子工程学院,江苏 徐州 221116
    2.江苏师范大学 语言科学学院,江苏 徐州 221116

Abstract: In order to solve the problem that designing Chinese dialect model singly and improve the performance of dialect identification, an approach of Chinese dialect identification based on combination diverse density is presented. Diverse density is a classical algorithm of multi-instance learning. Combination diverse density is a improved application algorithm based on it. The new method firstly pre-classify one kind dialect into several little classes. Secondly generate every little class into multi-instance bags. Then use EM-DD for multi-instance learning and get various diverse density points as a dialect model. Finally put forward average recent distance algorithm for classification. The method can get a complete and full model in training part, and consider the influence of every instance in unseen bags in pattern classification part. Finally the efficiency of the system is improved.

Key words: Chinese dialect identification, multi-instance learning, diverse density, k-means, average recent distance

摘要: 为了解决汉语方言模型设计较为单一的问题,提高方言辨识的效率,提出了一种基于联合多样性密度的汉语方言辨识方法。多样性密度算法是多示例学习中的一种经典算法,联合多样性密度算法是对其的改进应用。该方法首先将方言进行预分类为多个小类,然后将各小类方言进行多示例包生成,并通过期望最大多样性密度算法进行多示例学习,得到的多个多样性密度点作为方言的多示例模型,最后提出平均最近距离算法进行模式分类。该方法在训练模型时得到的方言模型更为全面、完整,在模式分类时考虑了未知包中每个示例的影响,提高了辨识系统的效率。

关键词: 汉语方言辨识, 多示例学习, 多样性密度, k近邻, 平均最近距离