计算机工程与应用 ›› 2018, Vol. 54 ›› Issue (3): 150-159.DOI: 10.3778/j.issn.1002-8331.1608-0411

• 模式识别与人工智能 • 上一篇    下一篇

基于特征关系的加权投票聚类集成研究

江志良,侯  远,吴  敏   

  1. 华东师范大学 计算机科学与软件工程学院,上海 200062
  • 出版日期:2018-02-01 发布日期:2018-02-07

Clustering ensemble with weighted voting based on feature correlation

JIANG Zhiliang, HOU Yuan, WU Min   

  1. School of Computer Science and Software Engineering, East China Normal University, Shanghai 200062, China
  • Online:2018-02-01 Published:2018-02-07

摘要: 对于具有多特征的复杂数据,使用子数据集作为聚类成员的输入并使用加权投票的聚类集成方法可以权衡不同聚类成员的质量,提高聚类的准确性和稳定性。针对子数据集的选择及权重的计算方式,提出了最小相关特征的子数据集选取方法,并基于特征关系分析比较了五种聚类成员的权重计算方法。实验结果表明,使用最小相关特征法选择每个聚类成员的输入数据,相比随机抽样法可提高聚类集成的准确率。基于五种权重计算方法的聚类集成准确率都比单聚类高,且时间消耗有明显差异。

关键词: 聚类集成, 特征选择, 加权融合

Abstract: To process complicated data that possesses many features, clustering ensemble based on weighted-voting is able to make a trade-off between clustering members with different qualities and improves the accuracy and stability. Towards subset selection and weight calculation, a sub-feature selection based on minimal correlation is proposed and in terms with feature-correlation, 5 different weight-calculation methods for clustering member are analyzed and compared. The experimental results show that subset generation based on minimal correlation is more effective than random sampling, and clustering ensemble based on any of the 5 weight-calculation methods gain higher accuracy than single clustering. Time-consumption among these 5 methods differ greatly.

Key words: clustering ensemble, feature selection, weighted voting