LEI Chen, MAO Yimin. Random Forest Algorithm Based on PCA and Hierarchical Selection Under Spark[J]. Computer Engineering and Applications, 2022, 58(6): 118-127.
[1] MANTAS C J,CASTELLANO J G,MORAL-GARCíA S,et al.A comparison of random forest based algorithms:random credal random forest versus oblique random forest[J].Soft Computing,2018,23(5):10739-10754.
[2] 李建中,刘显敏.大数据的一个重要方面:数据可用性[J].计算机研究与发展,2013,50(6):1147-1162.
LI J Z,LIU X M.An important aspect of big data:data usability[J].Journal of Computer Research and Development,2013,50(6):1147-1162.
[3] KIM A,MYUNG J,KIM H.Random forest ensemble using a weight-adjusted voting algorithm[J].Journal of the Korean Data and Information Science Society,2020,31(2):427-438.
[4] 胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015,24(4):214-218.
HU J,HU X D,CHENG J X.Big data hybrid computing model based on Spark[J].Computer System and Applications,2015,24(4):214-218.
[5] LUNGA D,GERRAND J,YANG L,et al.Apache Spark accelerated deep learning inference for large scale satellite image analytics[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2020,13:271-283.
[6] WU Z,LIN W,ZHANG Z,et al.An ensemble random forest algorithm for insurance big data analysis[C]//IEEE International Conference on Computational Science & Engineering,2017.
[7] AZAR A T,INBARANI H H,DEVI K R.Improved dominance rough set-based classification system[J].Neural Computing and Applications,2017,28:2231-2246.
[8] BANIA R K,HALDER A.R-ensembler:a greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data[J].Computer Methods and Programs in Biomedicine,2020,184(4).
[9] GALICIA A,TALAVERA-LLAMES R,TRONCOSO A,et al.Multi-step forecasting for big data time series based on ensemble learning[J].Knowledge-Based Systems,2018(6).
[10] LULLI A,ONETO L,ANGUITA D.Mining big data with random forests[J].Cognitive Computation,2019,11(6).
[11] MORFINO V,RAMPONE S.Towards near-real-time intrusion detection for iot devices using supervised learning and apache?Spark[J].Engineering,Electrical & Electronic,2020,9(3).
[12] HAMSTRA M,ZAHARIA M.Learning Spark:lightning-fast big data analytics[M].[S.l.]:Orlly & Associates Inc,2016.
[13] 杨博雄,杨雨绮.利用PCA进行深度学习图像特征提取后的降维研究[J].计算机系统应用,2018,28(1):279-283.
YANG B X,YANG Y Q.Applying PCA to on dimensionality reduction of image features extractied by deep learning[J].Computer System and Applications,2018,28(1):279-283.
[14] 江俊彦,彭智勇,吴小莹,等.基于分层抽样的重叠深网数据源选择[J].软件学报,2017,28(5):1271-1295.
JANG J Y,PENG Z Y,WU X Y.Overlapping deep Web data source selection based on stratified sampling[J].Journal of Software,2017,28(5):1271-1295.
[15] RAM P,SINHA K.Revisiting Kd-tree for nearest neighbor search[C]//Proceedings of the Twenty-Fifth ACM SIGKDD International Conference on Knowledge Discovery and Datamining,2019.
[16] JOHNSON R W.An introduction to the bootstrap[J].Teaching Stats,2010,23(2):49-54.
[17] SARVENDRANATH R,MEHTA N B.Antenna selection with power adaptation in interference-constrained cognitive radios[J].IEEE Transactions on Communications,2014,62(3):786-796.
[18] CHEN H,CHANG P,HU Z,et al.A spark-based ant lion algorithm for parameters optimization of random forest in credit classification[C]//2019 IEEE 3rd Information Technology,Networking,Electronic and Automation Control Conference(ITNEC),2019.
[19] WHITE H S.Bootstrap confidence intervals for the correlation coefficient[J].IEEE Transactions on Communications,2019:786-796.
[20] WANG S K,DAI B R.A G-means update ensemble learning approach for the imbalanced data stream with concept drifts[C]//International Conference on Big Data Analytics and Knowledge Discovery,2016.