Computer Engineering and Applications ›› 2010, Vol. 46 ›› Issue (7): 32-33.DOI: 10.3778/j.issn.1002-8331.2010.07.010

• 研究、探讨 • Previous Articles     Next Articles

Fuzzy C means cluster algorithm for co-regulation genes

ZHANG Li1,PANG Huan-li1,WANG Xiao-hu1,WANG Jia2   

  1. 1.School of Computer Science and Engineering,Changchun University of Technology,Changchun 130021,China
    2.Web Center,Dalian Polytechnic University,Dalian,Liaoning 116034,China
  • Received:2008-10-15 Revised:2009-02-03 Online:2010-03-01 Published:2010-03-01
  • Contact: ZHANG Li

一种共调控基因C均值模糊聚类算法

张 黎1,逄涣利1,王小虎1,王 佳2   

  1. 1.长春工业大学 计算机科学与工程学院,长春 130021
    2.大连工业大学 网络中心,辽宁 大连 116034
  • 通讯作者: 张 黎

Abstract: Cluster methods plays an important role in the gene expression data analysis,but the gene expression data has its own feature compared with the data in others fields,so the traditional distance measurement and cluster methods can not completely meet the target of researchers.The TransChisq distance based on Poisson distribution provides a new perspective to define the relationship between genes according to biological meaning,while fuzzy cluster algorithm can depict the complex interactions among genes thoroughly.Thereupon,an improved fuzzy C-means cluster algorithm which using the TransChisq distance is applied to the real gene expression data,the experiment result shows the method can cluster the gene expression date with its true classify in biology and find more co-regulation gene.

Key words: fuzzy C-means cluster, gene expression date, distance

摘要: 聚类方法在基因表达数据分析中发挥着非常重要的作用,但基因表达数据相对其他领域的数据具有自身的特性,因此传统的数据距离定义和聚类方法已不能完全满足研究者对生物数据的分析要求。提出一种基于泊松分布的数据距离度量方式TransChisq,它以一种全新的视角定义了基因数据之间的距离,鉴于模糊聚类算法能够更加深刻地描述复杂的基因作用关系,将TransChisq距离与模糊聚类方法相结合对模糊C均值算法进行改进,并应用于真实基因表达数据分析。实验结果表明,该方法能够按照生物学的真实分类将基因表达数据聚类,并且可以发现更多的共调控基因,更加满足了基因表达数据分析的需要。

关键词: 模糊C均值, 基因表达数据, 距离

CLC Number: