基于边缘峰度度量的特征缩减模糊聚类算法
作者:
作者单位:

江南大学数字媒体学院

作者简介:

通讯作者:

中图分类号:

TP273

基金项目:

国家自然科学基金项目面上项目( 61572236)


Feature-Reduction Fuzzy Clustering Algorithm Based on Marginal Kurtosis Measure
Author:
Affiliation:

Digital Median School, Jiangnan University

Fund Project:

National Natural Science Foundation of China under Grant 61572236

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    对于含有不重要特征、冗余特征的数据进行聚类,Yang等人[7]提出的特征缩减模糊聚类(feature reduction fuzzy c means,FRFCM)算法是有效的.该算法使用特征的均值方差比(mean-to-variance ratio,MVR)度量特征的重要性,删除权重小于阈值的不重要特征,仅保留重要特征进行聚类,以提升算法的性能和速度.但该算法存在如下不足:1)数据归一化后,特征的MVR值会发生改变,重要特征的MVR值可能会变小,不重要特征的MVR值可能会变大.2)一些数据的重要特征,其MVR指标未必大.3)FRFCM算法特征权重分配依赖于初始化,不合适的初始化会使算法为重要特征分配较小的权重,为不重要特征分配较大的权重,于是聚类过程中算法会删除重要特征,保留不重要特征,造成FRFCM算法的聚类结果不正确.针对FRFCM算法的不足,本文首先构造边缘峰度度量(marginal kurtosis measure,MKM)指标来度量特征的重要性,并基于该指标提出了一种新的、具有鲁棒的特征缩减模糊聚类算法.经过在人工数据集和真实数据集上验证本文提出的新算法是有效的.

    Abstract:

    Abstract:Feature reduction fuzzy c-means (FRFCM) algorithm proposed by Yang[7], has been proven effective for clustering data with redundant feature(s). FRFCM can automatically compute individual feature weight, and simultaneously reduce these redundant feature component. However, it still has the following disadvantages: 1) the large MVR value of original feature(s) may become small if the data is normalized, and vice versa. 2) the MVR value of important feature(s) of some datasets is/are not necessarily large. 3) Feature assignment is sensitive to initialization. FRFCM may produce large weights for important feature component if initialization is improper. These disadvantages can deteriorate the clustering accurancy. In order to mitigate the disadvantage of FRFCM algorithm, we first devise a new index, named marginal kurtosis measure (MKM), to measure the importance of features instead of using MVR index. Then a novel and robust feature reduction fuzzy c-means clustering algorithm based marginal kurtosis measure is proposed. Experiments on synthetic and real-world dataset demonstrate that our our new method is effective and efficient.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-03-02
  • 最后修改日期:2020-09-10
  • 录用日期:2020-09-27
  • 在线发布日期:
  • 出版日期: