计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (17): 137-142.DOI: 10.3778/j.issn.1002-8331.1805-0448

• 模式识别与人工智能 • 上一篇    下一篇

万有引力近邻的多视角分类学习

李艳琼,李冬冬,王喆,张静   

  1. 1.华东理工大学 信息科学与工程学院,上海 200237
    2.苏州大学 江苏省计算机信息处理技术重点实验室,江苏 苏州 215006
  • 出版日期:2019-09-01 发布日期:2019-08-30

Multi-View Learning with Gravitational Nearest Neighbor Classifier

LI Yanqiong, LI Dongdong, WANG Zhe, ZHANG Jing   

  1. 1.School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
    2.Jiangsu Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, Jiangsu 215006, China
  • Online:2019-09-01 Published:2019-08-30

摘要: 不平衡数据问题对传统的近邻分类器带来了很大的挑战,它的准则函数往往会使测试样本类别偏向于多数类,且参数对数据集有很强的依赖性。基于万有引力的固定半径近邻分类器(GFRNN)算法通过引入万有引力定律的思想,实现了一个针对不平衡数据的无参、高效的分类器,但GFRNN算法仅采用欧氏距离方法来计算半径和候选集。因此,基于GFRNN算法,在算法构造层面上提出了一种多视角学习框架MGFRNN。考虑到距离计算的多样性及所对应候选集的不确定性,在距离的计算中,采用欧式距离、一范数距离和切比雪夫距离三种度量方法,根据三种距离度量方法分别计算候选集半径,并计算候选集中各类样本对测试样本的万有引力大小,从而进行分类。实验结果证明,所提MGFRNN算法在比较算法中具有最高的分类精确度。

关键词: 万有引力, 近邻策略, 多视角学习, 不平衡数据, 机器学习

Abstract: When the traditional neighbor classifier deals with the imbalanced problem, its criterion function tends to bias the test sample to majority class and the parameters are strongly dependent on the data set. To overcome the drawbacks in NN-based classifiers, the Gravitational Fixed Radius Nearest Neighbor classifier(GFRNN) based on the law of universal gravitation realizes a parameterless and efficient classifier for imbalanced data. However, it is unreasonable that GFRNN adopting only Euclidean distance to calculate radius and select candidates. To this end, this paper proposes a Multi-view learning with Gravitational Fixed Radius Nearest Neighbor algorithm(MGFRNN). In MGFRNN, L1-norm, Euclidean and Chebyshev distance are adopted to realize multi-view learning. The experimental result validates that the proposed MGFRNN achieves the highest classification accuracy among comparison algorithms.

Key words: gravitation, nearest neighbor, multi-view learning, imbalanced data, machine learning