计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (27): 151-153.DOI: 10.3778/j.issn.1002-8331.2008.27.048

• 数据库、信号与信息处理 • 上一篇    下一篇

基于代表熵的基因表达数据聚类分析方法

陆 媛,杨慧中   

  1. 江南大学 通信与控制工程学院,江苏 无锡 214122
  • 收稿日期:2007-11-13 修回日期:2008-02-29 出版日期:2008-09-21 发布日期:2008-09-21
  • 通讯作者: 陆 媛

Clustering analysis methods of gene expression data based on representative entropy

LU Yuan,YANG Hui-zhong   

  1. School of Communication & Control Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2007-11-13 Revised:2008-02-29 Online:2008-09-21 Published:2008-09-21
  • Contact: LU Yuan

摘要: 针对基因表达数据样本少,维数高的特点,尤其是在样本分型缺乏先验知识的情况下,结合自组织特征映射的优点提出了基于代表熵的双向聚类算法。该算法首先通过自组织特征映射网络(SOM)对基因聚类,根据波动系数挑选特征基因。然后根据代表熵的大小判断基因聚类的好坏,并确定网络的神经元个数。最后采用FCM(Fuzzy C Means)聚类算法对挑选出的特征基因集进行样本分型。将该算法用于两组公开的基因表达数据集,实验结果表明该算法在降低特征维数的同时,得出了较高的聚类准确率。

Abstract: Because gene expression data is high dimensions and small samples,especially the less priori knowledge,a two-way clustering algorithm based on the representative entropy is proposed,which is combined with the advantages of Self Organizing feature Map(SOM) neural network.First,the clustering of genes is realized through the SOM network,and characteristic genes are selected according to the fluctuation coefficient.Then the quality of gene clustering is decided by the value of representative entropy.Finally,Self Organizing Feature Map algorithm is employed to classification of samples.This process is applied to two published data sets of gene expression.The experiment results show that the algorithm can reduce the feature space dimensions and improve the accuracy of clustering.