计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (20): 208-211.DOI: 10.3778/j.issn.1002-8331.2008.20.063

• 工程与应用 • 上一篇    下一篇

大肠癌诊断数据分类新算法研究

廖志芳1,樊晓平1,陈宇宙1,廖志宁2,瞿志华1,3   

  1. 1.中南大学 信息科学与工程学院,长沙 410075
    2.英国莱斯特郡拉夫堡大学 科学学院 计算机系,LE11 3TU,UK
    3.美国奥兰多中弗罗里达大学 电子工程与计算机学院,FL 32816,USA
  • 收稿日期:2007-12-27 修回日期:2008-02-19 出版日期:2008-07-11 发布日期:2008-07-11
  • 通讯作者: 廖志芳

Research on new classification algorithm for colorectal carcinoma diagnosis data

LIAO Zhi-fang1,FAN Xiao-ping1,CHEN Yu-zhou1,LIAO Zhi-ning2,QU Zhi-hua1,3   

  1. 1.School of Information Science and Technology,Central South University,Changsha 410075,China
    2.Department of Computer Science,Faculty of Science,Loughborough University,Leicestershire,LE11 3TU,UK
    3.School of Electrical Engineering and Computer Science,University of Central Florida,Orlando,FL 32816,USA
  • Received:2007-12-27 Revised:2008-02-19 Online:2008-07-11 Published:2008-07-11
  • Contact: LIAO Zhi-fang

摘要: 数据分类是数据挖掘技术在医疗数据分析中的一个重要应用,在分析了医疗数据特点后,以大肠早癌诊断数据为例,提出了利用计数最近邻算法对其进行分类的思想;同时在分析该算法性能的基础上,提出了基于检索树和样本密度的计数最近邻新算法对改数据进行分析,以检索树的构建来提高原算法的计算效率,基于全局密度、K-密度的改进算法来提高原算法的精确度。通过实验证明新算法在大肠早癌的数据分析中,其计算复杂度、存储空间和数据分类精确度都得到了较大的提高,同时新算法适应于数值数据、文本数据以及混合数据的分类。

关键词: 大肠早癌诊断数据, 计数最近邻算法, 全局密度, K-密度

Abstract: Data classification is an important data mining role in biomedicine.This paper proposes a method to analyze Colorectal Carcinoma diagnosis data based on counting KNN algorithm after analyzing the characteristics of biomedicine data.Though the count-weight-k-nearest neighbours for classification is simple and effective,it doesn’t deal with biomedicine data well.After analyzing the algorithm performance,an novel counting KNN algorithm by index tree and sample density is presented.The new method improves the accuracy of classification by using different algorithms of overall density and K-local density,and also improves efficiency by using a tree structure index.Experiments show that this method outperforms the distance-based voting KNN,and CwKNN.More importantly it is a single method that works for ordinal,nominal or mixed data.

Key words: colorectal carcinoma diagnosis data, KNN by counting, overall density, K-density