计算机工程与应用 ›› 2015, Vol. 51 ›› Issue (7): 149-153.

• 数据库、数据挖掘、机器学习 • 上一篇    下一篇

基于改进多样性密度的性别识别

顾明亮1,2,张世形1,鲍  薇2   

  1. 1.江苏师范大学 物理与电子工程学院,江苏 徐州 221116
    2.江苏师范大学 语言科学学院,江苏 徐州 221116
  • 出版日期:2015-04-01 发布日期:2015-03-31

Gender identification based on improved diverse density

GU Mingliang1,2, ZHANG Shixing1, BAO Wei2   

  1. 1.School of Physics & Electronic Engineering, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
    2.School of Linguistic Science, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
  • Online:2015-04-01 Published:2015-03-31

摘要: 为了避免大量计算来获取分类器阈值,提高性别识别的效率,提出了一种基于改进多样性密度的性别识别方法。该方法将男、女性训练语音包进行双类别多次标记,通过期望最大多样性密度算法进行多示例学习,得到两个多样性密度点,组成双点语言模型,提出示例近邻分类算法,选取多个示例进行模式分类。该方法综合考虑了男、女性语音样本对未知语音包的影响,不必进行阈值设定,减小了野点示例的影响,最终提高了系统的识别效率。

关键词: 多示例学习, 性别识别, 期望最大化多样性密度, 示例近邻, k近邻

Abstract: In order to avoid a large number of calculations for getting the threshold of classifier and improve the performance of gender identification, an approach of gender identification based on improved diverse density is presented. The new method puts male and female voice bags in double classes and labels many times. It uses EM-DD algorithm for multi-instance learning and then gets two diverse density points as a double-points model. It puts forward instance-k neighbor classification algorithm for classification by selecting multiple instances. The method considers the influences both of male and female voice samples to the unknown voice bag, doesn’t need to set the threshold and reduces the influences of outlying instances, and it improves the efficiency of the system.

Key words: multi-instance learning, gender identification, Expectation Maximization Diverse Density(EM-DD), instances-nearest neighbour, k-nearest neighbour