Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (3): 148-156.DOI: 10.3778/j.issn.1002-8331.2305-0204

• Theory, Research and Development • Previous Articles     Next Articles

Research on Granule Vectors Random Forest Classification Algorithm

ZHANG Kunbin, CHEN Yumin, WU Keshou, HOU Xianyu   

  1. College of Computer and Information Engineering, Xiamen University of Technology, Xiamen, Fujian 361024, China
  • Online:2024-02-01 Published:2024-02-01

粒向量驱动的随机森林分类算法研究

张锟滨,陈玉明,吴克寿,侯贤宇   

  1. 厦门理工学院 计算机与信息工程学院,福建 厦门 361024

Abstract: Granular computing is a computational paradigm that aligns with human cognitive characteristics, enabling the effective processing of complex data. Random Forest reduces the risk of overfitting associated with individual decision trees by ensembling multiple trees. However, it still faces some overfitting issues. To mitigate overfitting and enhance classification performance, this research introduces the concept of granular vector representation into Random Forest. Granular vectors possess the ability to represent high-dimensional features, thereby capturing more data patterns. The randomness in sample selection aids in controlling overfitting, while using different granular vectors for distinct decision trees enhances model diversity. Experimental results demonstrate that, compared to traditional Random Forest and other enhanced algorithms, the Random Forest algorithm based on granular vector representation exhibits superior generalization capabilities and significantly improves classification accuracy. The correctness and effectiveness of the granular vector-based Random Forest classification algorithm are demonstrated.

Key words: granular computing, granular vector, random forest, ensemble learning

摘要: 粒计算是一种符合人类认知特性的计算范式,能够有效处理复杂数据。随机森林通过集成多个决策树来降低单个决策树的过拟合风险,但仍存在一定的过拟合问题。为了减少过拟合并提高分类性能,在随机森林中引入了粒向量表示,提出了基于粒向量的随机森林分类算法。粒向量具有表示高维特征的能力,可以捕捉更多的数据模式;参照样本选择的随机性有助于控制过拟合;不同决策树使用不同的粒向量可以增加模型的多样性。实验结果表明,与传统随机森林以及其他改进算法相比,基于粒向量表示的随机森林算法具有较好的泛化能力,有效提高了分类的准确率,表明了基于粒向量的随机森林分类算法的正确性与有效性。

关键词: 粒计算, 粒向量, 随机森林, 集成学习