计算机工程与应用 ›› 2019, Vol. 55 ›› Issue (18): 38-44.DOI: 10.3778/j.issn.1002-8331.1806-0075

• 理论与研发 • 上一篇    下一篇

基于多示例学习的局部离群点改进算法

邓浩,秦岭   

  1. 南京工业大学 计算机科学与技术学院,南京 211816
  • 出版日期:2019-09-15 发布日期:2019-09-11

Improved Algorithm of Local Outlier Based on Multi-Instance Learning

DENG Hao, QIN Ling   

  1. College of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
  • Online:2019-09-15 Published:2019-09-11

摘要: 在多示例学习框架下,训练数据集由若干个包组成,包内含有多个用属性-值对形式表示的示例,系统对包内的多个示例进行学习。传统的基于多示例学习的局部离群点检测算法将多示例学习框架运用到数据集上,将多示例问题转化为单示例问题进行处理。但在示例包的转换过程中采用示例内部的特征长度所占比作为权重机制,并没有考察对结果影响较大的示例,分析原因或者动态调整其权重,从而对离群点检测的效果造成影响。针对这一问题,为了充分适应数据内部的分布特征,提出了一种基于多示例学习的局部离群点改进算法FWMIL-LOF。算法采用MIL(Multi-Instance Learning)框架,在示例包的转换过程中引入描述数据重要度的权重函数,通过定义惩罚策略对权重函数做相应调整,从而确定了不同特征属性的示例在所属包中的权重。在实际企业的实时采集监控系统中,通过仿真分析,并与其他经典局部离群点检测算法进行对比,验证了改进算法在离群点检测效果方面的提高。

关键词: 多示例学习, 权重机制, 特征, 惩罚策略

Abstract: In the multi-instance learning framework, the training data set consists of several packages. The package contains multiple examples represented by attribute-value pairs. The system learns multiple examples in the package. The traditional local outlier detection algorithm based on multi-instance learning applies the multi-instance learning framework to the data set, transforming the multi-example problem into a single example problem. However, in the conversion process of the example package, the ratio of the internal feature lengths is used as the weighting mechanism, examples of significant impact on the results do not be inspected, or the reasons be analyzed or their weights be adjusted dynamically, affecting the outlier detection effect. For this problem, in order to fully adapt to the internal distribution characteristics of data, a local outlier improvement algorithm FWMIL-LOF based on multi-instance learning is proposed. The algorithm adopts MIL(Multi-Instance Learning) framework, which introduces a weight function that describes the importance of data in the conversion process of the example package, and adjusts the weight function by defining a penalty strategy. Thus, the weight of examples with different feature attributes is determined in the belonging package. In the actual enterprise’s real-time acquisition and monitoring system, through simulation analysis, and compared with other classical local outlier detection algorithms, the improvement of the outlier detection effect of the improved algorithm is verified.

Key words: Multi-Instance Learning(MIL), weight mechanism, feature, penalty strategy