Naive Bayes based on data filling and continuous attribute

Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (1): 133-140.

Previous Articles Next Articles

Naive Bayes based on data filling and continuous attribute

LI Zhongbo, YANG Jianhua, LIU Wenqi

School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China

Online:2016-01-01 Published:2015-12-30

基于数据填补和连续属性的朴素贝叶斯算法

李忠波，杨建华，刘文琦

大连理工大学控制科学与控制工程学院，辽宁大连 116024

Abstract

Abstract: When dealing with classification problem, Naive Bayes（NB） usually assumes that the numerical continuous attributes follow normal distribution, the classification accuracy is also affected by the integrity of training data. But the actual sampled data are difficult to meet the above requirements. For missing data, the Naive Bayesian classifier uses existing incomplete data to implement parameter learning based on the Expectation-Maximum（EM） algorithm; for non-
normal numerical continuous attributes, distribution density based on kernel density estimation and a new method are used to calculate the maximum posterior probability, meanwhile, the classification experiment using standard data sets verifies the effectiveness of the improvement. Finally, the improved algorithm（EM-DNB） is applied to the prediction of the protein purification technologies in biological engineering. The experimental results show that the accuracy is improved.

Key words: Naive Bayes（NB）, Expectation-Maximum（EM） algorithm, continuous attributes, kernel?density?estimation, protein purification

摘要： 朴素贝叶斯算法（NB）在处理分类问题时通常假设训练样本的数值型连续属性满足正态分布，其分类精度也受到训练数据完整性的影响，而实际采样数据很难满足上述要求。针对数据缺失问题，基于期望最大值算法（EM），将朴素贝叶斯分类器利用已有的不完整数据进行参数学习；针对样本数值型连续属性非正态分布的情况，基于核密度估计，利用其分布密度（Distribution Density）和新的分析计算方法来求最大后验分布，同时用标准数据集的分类实验验证了改进的有效性。将改良的算法EM-DNB应用在生物工程蛋白质纯化工艺预测中，实验结果表明，预测精度有所提高。

关键词: 朴素贝叶斯（NB）, 期望最大值（EM）算法, 连续属性, 核密度估计, 蛋白质纯化

LI Zhongbo, YANG Jianhua, LIU Wenqi. Naive Bayes based on data filling and continuous attribute[J]. Computer Engineering and Applications, 2016, 52(1): 133-140.

李忠波，杨建华，刘文琦. 基于数据填补和连续属性的朴素贝叶斯算法[J]. 计算机工程与应用, 2016, 52(1): 133-140.

[1]	DAI Min. Web Cache Replacement Strategy Based on NB Classifier for Re-access Probability Prediction [J]. Computer Engineering and Applications, 2019, 55(19): 134-140.
[2]	ZHANG Rongguang, HU Xiaohui, ZONG Yongsheng. Discretization of continuous attributes based on improved discrete particle swarm optimization [J]. Computer Engineering and Applications, 2017, 53(18): 108-114.
[3]	YUE Hai-liang，YAN De-qin. Discretization of continuous attributes using information divergence [J]. Computer Engineering and Applications, 2010, 46(20): 103-105.
[4]	LIU Bo，HU Yun-peng，YU Hong-yi. EM-based frequency offset estimation algorithm with pilot-data assisted jointly [J]. Computer Engineering and Applications, 2010, 46(2): 125-128.
[5]	WANG Ke,ZHU Qi-bing,CUI Bao-tong. Method of discretization of continuous attributes of decision table [J]. Computer Engineering and Applications, 2008, 44(30): 148-149.

Naive Bayes based on data filling and continuous attribute

基于数据填补和连续属性的朴素贝叶斯算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Recommended Articles

Metrics