Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (20): 168-171.DOI: 10.3778/j.issn.1002-8331.2010.20.047

• 人工智能 • Previous Articles     Next Articles

Approach to optimize threshold of ANN on imbalance datasets

LI Ming-fang,ZHANG Hua-xiang,ZHANG Wen,JI Hua   

  1. School of Information Science and Engineering,Shandong Normal University,Jinan 250014,China
  • Received:2010-04-14 Revised:2010-05-17 Online:2010-07-11 Published:2010-07-11
  • Contact: LI Ming-fang

不平衡数据集的神经网络阈值优化方法

李明方,张化祥,张 雯,计 华   

  1. 山东师范大学 信息科学与工程学院,济南 250014
  • 通讯作者: 李明方

Abstract: The classification of imbalance datasets is a hot research area in the field of machine learning,and recently,many researchers have proposed several theories and algorithms to improve the performance of classical classification algorithms on imbalance datasets.One of the most important methods is adopting threshold selection criteria to determine the output threshold of an Artificial Neural Network(ANN).The commonly used threshold selection criteria have some drawbacks,such as failing to get optimal classification performances both on data in minority class and in majority class,only focusing on the classification accuracy of the majority class data.This paper proposes a new threshold selection criterion based on which,both the data in the minority class and majority class can reach optimal classification accuracies without the impact of the sample proportion.When the new threshold selection criterion is applied as a classifier evaluation criterion to classifiers trained using Artificial Neural Networks and Genetic approaches,good results can be obtained.

Key words: imbalance datasets, threshold selection criterion, Artificial Neural Network(ANN), genetic method

摘要: 不平衡数据集分类为机器学习热点研究问题之一,近年来研究人员提出很多理论和算法以改进传统分类技术在不平衡数据集上的性能,其中用阈值判定标准确定神经网络中的阈值是重要的方法之一。常用的阈值判定标准存在一定缺点,如不能使少数类及多数类分类精度同时取得最好、过于偏好多数类的精度等。为此提出一种新的阈值判定标准,依据该标准能够使少数类及多数类分类精度同时取得最好而不受样例类别比例的影响。以神经网络与遗传算法相结合训练分类器,作为阈值选择条件和分类器的评价标准,新标准能够得到较好的结果。

关键词: 不平衡数据集, 阈值判定标准, 神经网络, 遗传算法

CLC Number: