计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (9): 152-155.

• 数据库、信号与信息处理 • 上一篇    下一篇

利用连通分支对基因表示数据的聚类算法

周海岩,严云洋   

  1. 淮阴工学院 计算机工程系,江苏 淮安 223001
  • 收稿日期:2007-03-09 修回日期:2007-09-13 出版日期:2008-03-21 发布日期:2008-03-21
  • 通讯作者: 周海岩

Algorithm for clustering gene expression data using connected components

ZHOU Hai-yan,YAN Yun-yang   

  1. Department of Computer Engineering,Huaiyin Institute of Technology,Huaian,Jiangsu 223001,China
  • Received:2007-03-09 Revised:2007-09-13 Online:2008-03-21 Published:2008-03-21
  • Contact: ZHOU Hai-yan

摘要: 在生命科学中,需要对物种及基因进行分类,以获得对种群固有结构的认识。利用数据聚类方法,有效地辨别/识别基因表示数据的模式,对它们进行分类。将特征相似性大的归为一类,特征相异性大的归为不同类。这对于研究基因的结构、功能、以及不同种类基因之间的关系都具有重要意义。利用图论的方法对分子生物学中基因表示数据进行初始聚类,然后再结合别的算法,如K-近邻自学习聚类算法或基于中心点的自学习聚类算法,对其进一步求精。对于某种聚类判别准则,能够产生全局最优簇。最后对算法进行了分析和讨论,并用模拟数据进行了实验验证。

关键词: 基因表示数据, 数据聚类, 簇类, 无向图, 连通分支

Abstract: In life sciences,it is necessary to classify the species and genes in order to obtain the knowledge of these species.Using data clustering algorithm can effectively distinguish/identify the mode of gene expression data and categorize them.Those with most similarity are grouped into one category and those with most difference into another category,which is very important to study the structure,function and relations between different genes.Gene expression data in biology science are initially clustered by adopting the method of graph theory and then refined by combining with other method,i.e. k-near neighbor self-learning clustering algorithm or medoid-based self-learning clustering algorithm.Global optimal clusters can be generated for a specific clustering judgment rule.At last analyses and discusses the algorithm,which are tested with simulation data.

Key words: gene expression data, data clustering, cluster, undirected graph, connected components