Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (6): 8-12.DOI: 10.3778/j.issn.1002-8331.1808-0257

Previous Articles     Next Articles

Research on Algorithm of Nonparametric Kernel Density for Discriminant Analysis of Multidimensional Data

SHI Kai1,2, NIE Fuqiang1, SUN Feng2   

  1. 1.School of Statistics, Southwest University of Finance and Economics, Chengdu 611130, China
    2.College of Mathematics and Information Science, Leshan Normal University, Leshan, Sichuan 614000, China
  • Online:2019-03-15 Published:2019-03-14

多维数据判别分析的非参核密度算法研究

石  凯1,2,聂富强1,孙  峰2   

  1. 1.西南财经大学 统计学院,成都 611130
    2.乐山师范学院 数学与信息科学学院,四川 乐山 614000

Abstract: Discriminant analysis is widely used in data mining and recognition. How to make full use of the information of training sets, and how to improve the algorithm of discriminant rules and reduce the rate of misjudgement has always been the focus for many researches. In some traditional algorithms, the distribution type of data is often assumed firstly, but the structures of multidimensional data often violate the assumptions and lead to a higher rate of misjudgment. Aiming at such problems, this paper proposes to establish discriminant rules by the algorithm of nonparametric kernel density, and carries out empirical analysis through Iris and Seeds data. The results show that compared with the existing discriminant analysis algorithms, the proposed algorithm uses the information of data more fully, and significantly improves the accuracy of the multidimensional data. At the same time, this algorithm is not restricted by the distribution assumption, so it has wide applicability.

Key words: multidimensional data, discriminant analysis, nonparametric statistics, kernel function, probability density

摘要: 判别分析在数据挖掘、识别中有着广泛的应用,其中充分利用训练集的信息,改进判别规则算法,降低误判率一直是众多研究关注的焦点。传统的一些判别算法中,往往事先假定数据的分布类型来建立判别规则,但多维数据结构往往存在违背假定的情形,从而导致较高的误判率。针对此类问题,提出采用非参核密度算法建立多维数据的判别规则,同时通过Iris数据和Seeds数据进行实证分析。结果表明,与现有的判别分析算法相比较,所提判别算法利用样本资料信息更充分,显著提高了多维数据的判别精度,并且该算法不受分布假定的限制,具有广泛的适用性。

关键词: 多维数据, 判别分析, 非参数统计, 核函数, 概率密度