计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (32): 167-169.

• 数据库、信号与信息处理 • 上一篇    下一篇

基因表达时序数据的HMM层次聚类

赵国庆,邓 伟   

  1. 苏州大学 计算机科学与技术学院,江苏 苏州 215006
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-11-11 发布日期:2011-11-11

HMM-based hierarchical clustering of gene expression time series data

ZHAO Guoqing,DENG Wei   

  1. School of Computer Science & Technology,Soochow University,Suzhou,Jiangsu 215006,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-11-11 Published:2011-11-11

摘要: DNA微阵列技术的应用产生了大量的基因表达时序数据,对这些数据进行聚类是获取其中隐含的生物分子信息的一种重要方法。提出了一种基于隐马尔可夫模型(HMM)的层次聚类方法,根据基因表达时序数据的统计特性对其进行标准化和离散化等预处理,用HMM对经过预处理的数据建模以利用基因表达时序数据不同时间点之间的相关性,用层次聚类方法对建立的模型进行聚类。实验结果表明该方法不仅能够产生好的聚类,而且能够确定最优的聚类数。

关键词: 基因表达时序数据, 统计特性, 隐马尔可夫模型, 层次聚类

Abstract: The use of DNA microarray technology produces a large number of gene expression time series data.Clustering of these data is a significant approach to extract molecular bioinformation hidden in them.In this paper,a Hidden Markov Model-based Hierarchical Clustering(HMM-HC) method is presented to analyze gene expression time series data.Gene expression time series data are preprocessed according to their statistics,including normalizing them and discretizing them.HMMs are used to model the preprocessed data so as to take advantage of the time dependency between different time points in the gene profile.The built HMM models are clustered with hierarchical strategy to achieve clustering of the data.The experimental results show that this method can not only produce high-quality clusters,but also find out the appropriate number of clusters.

Key words: gene expression time series data, statistics, Hidden Markov Model(HMM), hierarchical clustering