计算机工程与应用 ›› 2013, Vol. 49 ›› Issue (9): 41-44.

• 理论研究、研发设计 • 上一篇    下一篇

权核Logistic回归模型的分类和特征选择算法

池光辉1,刘建伟1,李卫民2,罗雄麟1   

  1. 1.中国石油大学(北京) 自动化研究所,北京 102249
    2.上海大学 计算机工程与科学学院,上海 200072
  • 出版日期:2013-05-01 发布日期:2016-03-28

Classifier and feature selection algorithm by kernel-weighted Logistic regression model

CHI Guanghui1, LIU Jianwei1, LI Weimin2, LUO Xionglin1   

  1. 1.Research Institute of Automation, China University of Petroleum, Beijing 102249, China
    2.School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China
  • Online:2013-05-01 Published:2016-03-28

摘要: 监督学习情况下,经常遇到样例的维数远远大于样本个数的学习情况。此时,样例中存在许多与样例类标签无关的特征,研究如何同时实现稀疏特征选择并具有更好的分类性能的算法具有优势。提出了基于权核逻辑斯蒂非线性回归模型的分类和特征选择算法。权对角矩阵的对角元素在0到1之间取值,对角元素的取值作为学习参数由最优化过程确定,讨论了提出的快速轮转优化算法。提出的算法在十个实际数据集上进行了测试,实验结果显示,提出的分类算法与L1,L2,Lp正则化逻辑斯蒂模型分类算法比较具有优势。

关键词: 权矩阵, 逻辑斯蒂回归, 特征选择, 非线性模型, 核函数

Abstract: Under supervised learning settings, problems that the dimension of the samples is typically larger than the number of samples are often encountered, i.e. many irrelevant features exist. In such case, the approaches that simultaneously achieve sparsely variable selection and better accuracy of classification are more preferable. In this paper, classification and feature slection algorithm based on kernel-weighted nonlinear logistic regression model is proposed. Each diagonal element of the weight diagonal matrix has a value between 0 and 1, which is as a learning parameter determined by optimization procedure, and fast alternative optimization methods are discussed. The proposed methods are tested on ten real-world datasets. The experimental results indicate that the proposed methods show high classification accuracies on these data sets than L1, L2, Lp norm regularization classifier algorithm of logistic regression model.

Key words: weighted matrix, logistic regression, feature selection, nonlinear model, kernel function