计算机工程与应用 ›› 2023, Vol. 59 ›› Issue (18): 84-97.DOI: 10.3778/j.issn.1002-8331.2211-0411

• 理论与研发 • 上一篇    下一篇

多类属性加权与正交变换融合的朴素贝叶斯

刘海涛,陈春梅,庞忠祥,梁志强,李晴   

  1. 西南科技大学 信息工程学院,四川 绵阳 621002
  • 出版日期:2023-09-15 发布日期:2023-09-15

Naive Bayes Fusion of Multiple Attribute Weighting and Orthogonal Transformation

LIU Haitao, CHEN Chunmei, PANG Zhongxiang, LIANG Zhiqiang, LI Qing   

  1. College of Information Engineering, Southwest University of Science and Technology, Mianyang, Sichuan 621002, China
  • Online:2023-09-15 Published:2023-09-15

摘要: 由于朴素贝叶斯算法忽略了数据多维属性的相关性,从而导致分类算法的极大应用局限。对此提出多类属性加权与正交变换融合的朴素贝叶斯改进算法。利用贡献度与相关互信息去量化离散属性以及离散属性值之间的相关程度,以获得其权重;利用正交变换方法消除连续属性之间的线性关系;将加权后的离散属性和正交变换后的连续属性的条件概率进行区分计算,从而得到较高的分类精度并提高算法的泛化能力。通过在公开数据集以及校园一卡通数据集上的[k]折交叉验证,实验结果表明,与最新的5种改进朴素贝叶斯算法相比,该算法的准确率高了7.19~9.94个百分点,加权平均F1值高了6.4~11.64个百分点。

关键词: 多维混合属性, 离散属性加权, 离散属性值加权, 正交变换, [k]折交叉验证

Abstract: Because the Naive Bayes algorithm ignores the correlation of multi-dimensional attributes of data, it leads to great application limitations of classification algorithms. In this paper, an improved Naive Bayes algorithm combining multiple attribute weighting and orthogonal transformation is proposed. Firstly, the contribution degree and related mutual information are used to quantify the correlation between discrete attributes and discrete attribute values to obtain their weights. Then, the orthogonal transformation method is used to eliminate the linear relationship between continuous attributes. Then, the conditional probabilities of the weighted discrete attributes and the continuous attributes after orthogonal transformation are distinguished and calculated to obtain higher classification accuracy and improve the generalization ability of the algorithm. Through the [k]-fold cross-validation on the public data set and the campus card data set, the experimental results show that compared with the latest five improved Naive Bayes algorithms, the accuracy of the proposed algorithm is 7.19~9.94 percentage points higher, and the weighted average F1 value is 6.4~11.64 percentage points higher.

Key words: multidimensional mixed attributes, discrete attribute weighted, discrete attribute value weighted, orthogonal transformation, [k]-fold cross validation