Multi-task coupled logistic regression and its fast implementation for large multi-task datasets

doi:10.3778/j.issn.1002-8331.1603-0450

Abstract

Abstract: When facing multi-task learning problems, it is desirable that the learning method can find the correct input-output features and share the commonality among multiple domains and also scale up for large multi-task datasets. This paper introduces the multi-task coupled logistic regression framework called MTC-LR, which is a new method for generating each classifier for each task, capable of sharing the commonality among multi-task domains. The basic idea of MTC-LR is to use all individual logistic regression based classifiers, each one appropriate for each task domain, but in contrast to other SVM based proposals, learning all the parameter vectors of all individual classifiers by using the conjugate gradient method, in a global way and without the use of kernel trick, and being easily extended into its scaled version. This paper theoretically shows that the addition of a new term in the cost function of the set of logistic regressions（that penalizes the diversity among multiple tasks） produces a coupling of multiple tasks that allows MTC-LR to improve the learning performance in a logistic-regression way. This finding can make us easily integrate it with a state-of-the-art fast logistic regression algorithm called CDdual to develop its fast version MTC-LR-CDdual for large multi-task datasets. The proposed algorithm MTC-LR-CDdual is also theoretically analyzed. The experimental results on artificial and real datasets indicate the effectiveness of the proposed algorithm MTC-LR-CDdual in classification accuracy, speed and robustness.

Key words: multi-task classification learning, logistic regression, posterior probability, dual coordinate descent method

摘要： 多任务学习通过寻找并共享不同任务域之间的共性特征来完成学习，利用知识迁移加速不同任务域的学习为每个任务域构建一个分类器。提出了一种基于罗杰斯特回归模型的多任务学习方法MTC-LR（Multi-task Coupled Logistic Regression）。“罗杰斯特回归模型”已经被成功应用于单任务分类器上，该模型被众多实验证明是有效的，正是这种方法给人们带来了启示。从理论上证明了通过构造多任务分类器的“开销函数”和“差异性度量函数”，MTC-LR算法可以提高多任务分类器的各自分类精度。相比传统的基于SVM的多任务学习方法，MTC-LR并不依赖于核方法而是通过共轭梯度下降法寻找各个分类器的最优参数。同时MTC-LR与采用“罗杰斯特回归模型”的快速算法CDdual更容易结合，可扩展至大样本的多任务分类学习。正是基于上述发现，为了充分高效利用大样本的多任务域数据，满足大样本的快速运算，在MTC-LR算法的基础上，结合最新的CDdual（The Dual Coordinate Descent Method）算法，提出了MTC-LR的快速算法MTC-LR-CDdual，并对该算法进行了相关的理论分析。将该算法在人工数据集和真实数据集上进行了验证，实验结果表明该算法有着较高的识别率、快速的识别速度和较好的鲁棒性。

关键词: 多任务分类, 罗杰斯特回归, 后验概率, 对偶坐标下降法

GU Xin1，2, CAO Danhua1, WU Yubin1, LUAN Yongxin2, WANG Weicheng3. Multi-task coupled logistic regression and its fast implementation for large multi-task datasets[J]. Computer Engineering and Applications, 2017, 53(15): 47-56.

顾鑫1，2，曹丹华1，吴裕斌1，栾永昕2，王伟成3. 基于逻辑回归的多任务域快速分类学习算法[J]. 计算机工程与应用, 2017, 53(15): 47-56.

[1]	SHU Shike, LI Lu. Multi-factor Quantitative Stock Selection Strategy Based on Sparsity Penalty [J]. Computer Engineering and Applications, 2021, 57(1): 110-117.
[2]	SHE Xiangyang, WANG Shaopeng. Research on Advertising Click Through Rate Prediction Model Based on FTRL Optimization Algorithm [J]. Computer Engineering and Applications, 2019, 55(14): 122-126.
[3]	LAI Yongkai1, CHEN Xiangyu2, LIU Hai2. Research on Software Defect Prediction Based on Bayesian Logistic Regression [J]. Computer Engineering and Applications, 2019, 55(11): 204-208.
[4]	QIAO Yaqin, MA Yingcang, CHEN Hong, YANG Xiaofei. Multi-label classification algorithm of structure sample k-nearest neighbors data [J]. Computer Engineering and Applications, 2018, 54(6): 135-142.
[5]	WANG Juan1, LIU Zhe1, SONG Yuqing1, CHEN Xiangyuan2. Extraction and analysis of thyroid texture features based on improved GLCM [J]. Computer Engineering and Applications, 2018, 54(23): 176-182.
[6]	XIE Junqing1，2, LIN Ke1，2, KONG Guilan1. Study of computerized methods to predict in-hospital mortality in intensive care unit [J]. Computer Engineering and Applications, 2017, 53(20): 24-30.
[7]	CHEN Qiuyuan1，2, CHENG Guang1，2, LI Di1，2, ZHANG Jian1，2. Named entity recognition for mechanical design and manufacturing area [J]. Computer Engineering and Applications, 2017, 53(20): 100-104.
[8]	LIU Hongli, LIU Weifeng, WANG Yanjiang, DONG Liping. Hessian regularized Logistic regression [J]. Computer Engineering and Applications, 2016, 52(5): 236-240.
[9]	ZHANG Xiaoyi1, SU Yu2, YAN Xiaohui3. Context-awareness recommendation based on user browsing log [J]. Computer Engineering and Applications, 2016, 52(22): 99-104.
[10]	XIU Bin1，2, LI Chenglong2, TANG Jin1，2, LUO Bin1，2. Infrared target tracking algorithm based on motion estimation [J]. Computer Engineering and Applications, 2014, 50(12): 125-128.
[11]	CHI Guanghui1, LIU Jianwei1, LI Weimin2, LUO Xionglin1. Classifier and feature selection algorithm by kernel-weighted Logistic regression model [J]. Computer Engineering and Applications, 2013, 49(9): 41-44.
[12]	JI Xiaofeng1，2, WEI Xuemei1，2. Task concentration cognition mode of drivers under travel information [J]. Computer Engineering and Applications, 2013, 49(13): 21-25.
[13]	ZHANG Yan1, LI Qingfeng2, GONG Lei3, YAO Jiangang1. Feature selection method of zero resistance insulator infrared thermal image based on Logistic regression analysis [J]. Computer Engineering and Applications, 2013, 49(1): 222-226.
[14]	FEI Wenlong1，2, LV Hong2, WEI Zhihui1. Application of Logistic regression method in cloud detection of satellite image [J]. Computer Engineering and Applications, 2012, 48(4): 18-21.
[15]	PAN Shengjun1, YANG Benjuan1, LIU Benyong1，2. Blurring detection in image forensics based on posterior probability [J]. Computer Engineering and Applications, 2012, 48(32): 181-186.

Multi-task coupled logistic regression and its fast implementation for large multi-task datasets

基于逻辑回归的多任务域快速分类学习算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics