Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (6): 330-338.DOI: 10.3778/j.issn.1002-8331.2211-0218

• Engineering and Applications • Previous Articles     Next Articles

Hybrid LightGBM Model for Breast Cancer Diagnosis

XING Changzheng, XU Jiayu   

  1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125105, China
  • Online:2024-03-15 Published:2024-03-15

LightGBM混合模型在乳腺癌诊断中的应用

邢长征,徐佳玉   

  1. 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105

Abstract: Breast cancer is one of the most common types of cancer, and its prevalence continues to rise every year. Without surgical biopsy, it can effectively provide auxiliary diagnosis and treatment for doctors and reduce the pain of patients by analyzing various indicators of the nucleus to predict whether the mass is benign or not. Therefore, a breast cancer diagnosis model based on LightGBM algorithm is proposed. Firstly, the borderline-synthetic minority oversampling technique (Borderline-SMOTE)  is used to improve the problem of imbalanced breast cancer diagnosis data. Secondly, the PWLCM chaotic map, the new inertia weight and the criss-cross algorithm are introduced into the sparrow search algorithm (SSA)  to improve it, and then the improved SSA algorithm is used to automatically optimize the parameters of LightGBM. Then, because LightGBM is sensitive to noise, an OVR-Jacobian regularization method is proposed to reduce the noise of LightGBM. Finally, the improved LightGBM hybrid model is used to diagnose breast cancer. The experimental results show that the proposed hybrid model is superior to the common models in the three indicators of mean square error, coefficient of determination and cross-validation score, showing its better diagnostic effect.

Key words: breast cancer prediction, LightGBM, sparrow search algorithm, Borderline-SMOTE algorithm, machine learning, Jacobian regularization

摘要: 乳腺癌是最常见的癌症种类之一,且患病率每年仍在上升。在不进行手术活检的情况下,通过分析细胞核的各项指标来预测肿块的良性与否,可以有效地为医生提供辅助诊疗并减少患者的痛苦。为此,提出了一种基于LightGBM算法的乳腺癌诊断模型。使用边界-合成少数类过采样算法(borderline-synthetic minority oversampling technique,Borderline-SMOTE)来改善乳腺癌确诊数据不平衡的问题。在麻雀搜索算法(sparrow search algorithm,SSA)中引入PWLCM混沌映射、全新的惯性权重和纵横交叉算法对其进行改进,再运用改进后的SSA算法对LightGBM的参数进行自动寻优。由于LightGBM对噪点较为敏感,所以提出了一种OVR-Jacobian正则化方法对LightGBM进行降噪处理。使用改进后的LightGBM混合模型对乳腺癌进行诊断。实验结果表明,提出的混合模型在均方误差、决定系数和交叉验证得分这三个指标上均优于常见的模型,显示出其较好的诊断效果。

关键词: 乳腺癌预测, LightGBM, 麻雀搜索算法, Borderline-SMOTE算法, 机器学习, Jacobian正则化