Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (22): 158-160.

• 数据库与信息处理 • Previous Articles     Next Articles

Graph indexing based on automatic clustering model selection

ZHENG Ai-hua,TANG Jin,LUO Bin   

  1. School of Computer Science and Technology,Anhui University,Hefei 230039,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-01 Published:2007-08-01
  • Contact: ZHENG Ai-hua

聚类模型参数自动选择的图库索引

郑爱华,汤 进,罗 斌   

  1. 安徽大学 计算机科学与技术学院,合肥 230039
  • 通讯作者: 郑爱华

Abstract: A graph database indexing method,which is based on pattern clustering and automatic model selection,is proposed.The traditional Expectation Maximization(EM) algorithm provides an effective method for parameter estimation in mixture model clustering,but the number of model components need to be fixed before the processing progress,which will certainly reduce the accuracy of the high dimensional indexing.The proposed indexing method is based on the automatic mixture model selection algorithm,which uses the improved component-wise EM algorithm,the vector quantization method and probabilistic approximation mechanism.The experimental results show that the retrieval efficiency is increased while the true positive rate is kept in high level.

Key words: automatic model selection, Component-wise EM of Mixture algorithm, vector quantity, probabilistic approximation

摘要: 提出一种基于模式聚类和混合模型参数自动选择的图库索引方法。因为传统的EM(Expectation Maximization)算法为混合模型聚类问题中的参数估计提供了一个很好的解决方法,但需要事先指定聚类数,影响了高维数据索引的精度和效率。综合利用改进的CEM2(Component-wise EM of Mixture)混合模型自动选择算法、矢量量化和概率近似的索引机制,在保证准确率同时有效提高了检索效率。

关键词: 模型自动选择, 改进CEM2算法, 矢量量化, 概率近似索引