Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (12): 64-68.

Previous Articles     Next Articles

Clustering XML documents based on feature order preference

WANG Chengyong, DU Qingwei, SUN Jing, SUN Zhen   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Online:2016-06-15 Published:2016-06-14

基于特征偏好的XML文档聚类算法

王成勇,杜庆伟,孙  静,孙  振   

  1. 南京航空航天大学 计算机科学与技术学院,南京 210016

Abstract: Clustering for XML documents plays important roles in many data application domains. The algorithm of clustering for XML documents with feature order preference selects features from XML documents, represents XML documents as vectors in an abstract n-dimensional feature space, sets weights for each feature according to the feature order preference, and updates weights in each iterative clustering process. Experimental results show that when the feature order preference in CFP(Clustering with Feature order Preference) combines with the level weight used in the XML document representation, this application can offset the shortcomings when vectorizing XML documents and improve the precision of clustering for XML documents.

Key words: clustering Extensible Markup Language(XML) documents, level weight, feature order preference

摘要: XML文档聚类在众多数据应用领域都具有重要作用。基于特征偏好的XML文档聚类算法是对XML文档进行特征选择,将XML文档描述为[n]维特征向量,再结合CFP(Clustering with Feature order Preference)算法,根据特征偏好为其赋予权重,每次迭代聚类过程中进行权重的更新。实验结果表明当CFP算法中的特征偏好权重和XML文档向量化时所用的层次权重设定相结合时,可弥补XML文档向量化时的弊端,提高了XML文档聚类的精度。

关键词: 可扩展标记语言(XML)文档聚类, 层次权重, 特征偏好