昆明理工大学 信息工程与自动化学院,昆明 650500
School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
Selecting a reasonable initial clustering center is the premise of correct clustering. Most of the existing K-means algorithms have some shortcomings, such as randomly selecting clustering centers and unable to deal with outliers, an improved K-means clustering algorithm for selecting initial clustering centers based on dissimilarity measure is proposed. According to the dissimilarity of each data object, the dissimilarity matrix is constructed, and two measures of mean dissimilarity and total dissimilarity are defined. Then the initial clustering center is determined according to the criteria, and the median of data points in each cluster is used to replace the mean value for the subsequent iteration of clustering center, so as to eliminate the effect of outliers on clustering accuracy. In addition, the proposed algorithm maintains consistent results every time, and has better robustness in initializing and handling outliers. Finally, experiments are performed on the synthetic datasets and UCI datasets. Compared with three classical clustering algorithms and two improved K-means algorithms, the proposed algorithm has better clustering performance.