%0 Journal Article %A YUAN Fang %A YANG Youlong %T Improved Distance Formula of [K]-modes Clustering Algorithm for Mixed Categorical Attribute Data %D 2020 %R 10.3778/j.issn.1002-8331.1901-0423 %J Computer Engineering and Applications %P 186-193 %V 56 %N 6 %X

Traditional [K]-modes algorithm is widely used in categorical attribute clustering, but traditional algorithms do not distinguish ordinal categorical attribute and disordered categorical attribute. On the basis of distinguishing the two attributes, a new distance formula is proposed and the algorithm flow is optimized. The reasonable range of the distance between two adjacent attribute value of ordinal categorical attribute is determined by the distance value of the disordered categorical attributes. Based on the sequential relationship of the ordinal categorical attributes, the distance formula of ordinal categorical attribute is constructed. The proportion of each attribute value in the cluster is introduced as the distance parameter to calculate the distance between the data points and the centroid. The new distance formula describes the distance of ordinal attributes well, and balances the difference between the distance formulas of two different categorical attributes. The experimental results show that the improved algorithm and distance formula proposed in this paper is more effective than the original [K]-modes algorithm and its improved algorithm on UCI real data sets.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1901-0423