计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (28): 237-240.

• 工程与应用 • 上一篇    下一篇

应用于癌症基因表达数据的OMB双向聚类算法

王常武,刘楠楠,贾永伟,王宝文,刘文远   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-10-01 发布日期:2011-10-01

OMB biclustering algorithm for cancer gene expression data

WANG Changwu,LIU Nannan,JIA Yongwei,WANG Baowen,LIU Wenyuan   

  1. Department of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-10-01 Published:2011-10-01

摘要: 癌症基因表达数据的聚类分析可以为癌症的早期诊断和精确的癌症亚型分型提供依据。针对癌症基因表达数据的特点,提出一种称为OMB(Override Matrix Bicluster)的双向聚类算法。OMB算法分别在基因表达数据矩阵的行和列上搜索低于阈值的行和列,用删除添加算法产生一个子矩阵;构建与基因表达矩阵大小相同的覆盖矩阵,标识矩阵中上一次迭代产生的子矩阵的位置;在标识出来的矩阵中,重复贪婪迭代搜索找到K个聚类结果。Matlab实验结果表明OMB算法对具有重叠结构的癌症基因表达数据具有更好的聚类效果。

关键词: 癌症基因表达数据, 双向聚类算法, 贪婪迭代, 覆盖矩阵, 平均平方残基

Abstract: Cluster analysis on cancer gene expression data provides the basis for cancer early diagnosis and accurate classification of cancer subtypes.For the characteristics of cancer gene expression data,a biclustering algorithm named OMB(Override Matrix Bicluster) is presented.In OMB algorithms,it searches the ones below the threshold values in the rows and columns of gene expression data matrix respectively,uses delete add algorithm to generate a sub-matrix,builds a covering matrix that is the same size as gene expression matrix,identifies the location of the sub-matrix which is generated by last iteration,finds K clustering results through greedy iterative search.Matlab experimental results show that the OMB algorithm has better clustering results on cancer gene expression data with overlapping structure.

Key words: cancer gene expression, biclustering algorithm, greedy iteration, cover matrix, mean square residue