Research of Random Forests Combining with Factor Analysis

doi:10.3778/j.issn.1002-8331.1808-0266

Abstract

Abstract: Affected by the imbalance of feature importance, random forests may randomly extract weak feature subsets to generate a “weak decision tree”, which leads to a decrease in the convergence speed of the model and a decrease in the performance of the model. In view of this, this paper proposes a random forest model of fusion factor analysis. The main innovation is to construct a feature set by factor analysis method, and then form a candidate subset of each split node according to the feature number and random extraction feature. Based on the model’s classification prediction, regression fitting, accuracy and running time of feature importance analysis, the overall performance of 9 UCI data comprehensive survey models is selected, and compared with decision trees and random forests. The results show that the random forest model of fusion factor analysis basically eliminates the decision tree with low accuracy, improves the accuracy and convergence speed, and is more generalized, which is more conducive to high-dimensional big data, feasible and effective.

Key words: random forest, factor analysis, classification, regression, importance of feature, traditional Chinese medicine informatics

摘要： 受特征重要性不平衡的影响，随机森林可能随机抽取到弱特征子集，从而生成“弱决策树”，进而导致模型的收敛速度降低、模型的性能下降。鉴于此，提出融合因子分析的随机森林模型，主要创新在于采用因子分析法构建特征组，再按特征个数比随机抽取特征形成每个分裂节点的候选子集。以模型的分类预测、回归拟合、特征重要性分析的准确率和运行时间为评价指标，选取了9组UCI数据综合考察模型的整体性能，并与决策树、随机森林对比实验。结果表明：融合因子分析的随机森林模型基本消除了准确率低的决策树产生，提高了模型的准确率和收敛速度，泛化性更强，更加有利于高维大数据，可行有效。

关键词: 随机森林, 因子分析, 分类, 回归, 特征重要性, 中医药信息学

LI Huan, XIONG Mengying, NIE Bin, DU Jianqiang, ZHOU Li, HUANG Qiang. Research of Random Forests Combining with Factor Analysis[J]. Computer Engineering and Applications, 2019, 55(23): 125-130.

李欢，熊梦莹，聂斌，杜建强，周丽，黄强. 融合因子分析的随机森林研究[J]. 计算机工程与应用, 2019, 55(23): 125-130.

[1]	YANG Chunxia, LI Xinxu, WU Jiajun, LIU Tianyu. Hierarchical Network Sentiment Classification Based on Attention Interaction Mechanism [J]. Computer Engineering and Applications, 2021, 57(9): 134-139.
[2]	ZHANG Hanyu, WU Zhihao, XU Yong, CHEN Bin. Face Forensics Detection Method Based on Enhanced Convolutional Neural Networks [J]. Computer Engineering and Applications, 2021, 57(8): 220-224.
[3]	MA Mengping, YANG Zhixia. Asymmetric [ν]-Kernel-Free Quadratic Surface Support Vector Regression [J]. Computer Engineering and Applications, 2021, 57(7): 70-77.
[4]	ZHAO Linsuo, MA Ruiqiang, JIANG Tian, SONG Baoyan,PAN Yishan. Adaptive Early Warning Method for Streaming Big Data Events Based on Two-Stage Regression [J]. Computer Engineering and Applications, 2021, 57(7): 88-94.
[5]	YANG Li, WU Yi, WEI Debin, PAN Chengsheng. Satellite Network Traffic Prediction Based on Spatiotemporal Correlation [J]. Computer Engineering and Applications, 2021, 57(7): 101-106.
[6]	HAN Weiyu, CHENG Longsheng. Research on Roling Bearing Failure Mode Classification Based on MTS and SVM [J]. Computer Engineering and Applications, 2021, 57(6): 239-246.
[7]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[8]	HAN Dongfang, Turdy Toheti, Askar Hamdulla. Survey on Question Classification Method in Question Answering System [J]. Computer Engineering and Applications, 2021, 57(6): 10-21.
[9]	HUANG Jinjie, LIN Jiangquan, HE Yongjun, HE Jinjie, WANG Yajun. Chinese Short Text Classification Algorithm Based on Local Semantics and Context [J]. Computer Engineering and Applications, 2021, 57(6): 94-100.
[10]	YANG Yemin, ZHANG Huijun, ZHANG Xiaolong. Research on Interpretable Visual Analysis Method of Random Forest [J]. Computer Engineering and Applications, 2021, 57(6): 168-175.
[11]	LI Shuo, LIANG Yi. Prediction Model of Execution Time for Batch Application in Spark [J]. Computer Engineering and Applications, 2021, 57(5): 79-87.
[12]	WANG Fengqin, KE Hengjin. Application of CNN and Its Analysis in Depression Identification [J]. Computer Engineering and Applications, 2021, 57(5): 245-250.
[13]	WAN Yaling, ZHONG Xiwu, LIU Hui, QIAN Yurong. Survey of Application of Convolutional Neural Network in Classification of Hyperspectral Images [J]. Computer Engineering and Applications, 2021, 57(4): 1-10.
[14]	TAO Tiwei, LIU Mingxia, WANG Mingliang, WANG Linlin, YANG Deyun, ZHANG Qiang. Effective Distance Based Low-Rank Representation [J]. Computer Engineering and Applications, 2021, 57(4): 141-147.
[15]	ZHENG Cheng, DONG Chunyang, HUANG Xiayan. Short Text Classification Method Based on BTM Graph Convolutional Network [J]. Computer Engineering and Applications, 2021, 57(4): 155-160.

Research of Random Forests Combining with Factor Analysis

融合因子分析的随机森林研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics