Clustering XML documents based on feature order preference

Computer Engineering and Applications ›› 2016, Vol. 52 ›› Issue (12): 64-68.

Previous Articles Next Articles

Clustering XML documents based on feature order preference

WANG Chengyong, DU Qingwei, SUN Jing, SUN Zhen

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

Online:2016-06-15 Published:2016-06-14

基于特征偏好的XML文档聚类算法

王成勇，杜庆伟，孙静，孙振

南京航空航天大学计算机科学与技术学院，南京 210016

Abstract

Abstract: Clustering for XML documents plays important roles in many data application domains. The algorithm of clustering for XML documents with feature order preference selects features from XML documents, represents XML documents as vectors in an abstract n-dimensional feature space, sets weights for each feature according to the feature order preference, and updates weights in each iterative clustering process. Experimental results show that when the feature order preference in CFP（Clustering with Feature order Preference） combines with the level weight used in the XML document representation, this application can offset the shortcomings when vectorizing XML documents and improve the precision of clustering for XML documents.

Key words: clustering Extensible Markup Language（XML） documents, level weight, feature order preference

摘要： XML文档聚类在众多数据应用领域都具有重要作用。基于特征偏好的XML文档聚类算法是对XML文档进行特征选择，将XML文档描述为[n]维特征向量，再结合CFP（Clustering with Feature order Preference）算法，根据特征偏好为其赋予权重，每次迭代聚类过程中进行权重的更新。实验结果表明当CFP算法中的特征偏好权重和XML文档向量化时所用的层次权重设定相结合时，可弥补XML文档向量化时的弊端，提高了XML文档聚类的精度。

关键词: 可扩展标记语言（XML）文档聚类, 层次权重, 特征偏好

WANG Chengyong, DU Qingwei, SUN Jing, SUN Zhen. Clustering XML documents based on feature order preference[J]. Computer Engineering and Applications, 2016, 52(12): 64-68.

王成勇，杜庆伟，孙静，孙振. 基于特征偏好的XML文档聚类算法[J]. 计算机工程与应用, 2016, 52(12): 64-68.

Clustering XML documents based on feature order preference

基于特征偏好的XML文档聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics