基于数据分布特性的代价敏感宽度学习系统
作者:
作者单位:

1.湖南师范大学;2.中南大学

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Data distribution-based Cost-sensitive Broad Learning System
Author:
Affiliation:

1.Hunan Normal University;2.Central South University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    宽度学习系统(Broad Learning System,BLS)作为深度神经网络的替代框架,具有快速自适应模型结构选择和在线增量学习能力,被认为是知识发现和数据工程领域中一种极具前途的技术。传统的BLS主要应用于数据分布均衡且误分类代价相同的模式分类任务,但大多数实际应用的数据是非均衡分布的,如网络入侵监测、医疗诊断、信用卡欺诈检测等。本文提出一种基于数据分布特性的代价敏感BLS(Data distribution-based Cost-sensitive-BLS,DDbCs-BLS),解决数据分布不均、误分代价不同的模式分类任务。DDbCs-BLS在充分考虑数据统计分布特性的基础上寻找代价敏感型BLS分类器的最佳分类边界,保证少数类样本信息不被丢失,从而提高BLS在各类数据集上的模式分类性能。在多种公共数据集(包括均衡和不均衡数据集)上进行大量的验证性和对比性实验,结果表明DDbCs-BLS能有效确定分类边界线的最佳位置,无论是在均衡数据集还是在不均衡数据集上均能获得更好的分类性能。

    Abstract:

    Broad Learning System (BLS) provides a flexible modeling framework, which is a potential substitute of deep neural network models. Due to its fast adaptive ability of automatic model structure selection and online incremental learning strategies, BLS is referred to as a promising technology in the field of knowledge discovery and data engineering. However, traditional BLS model is mainly aimed at pattern classification tasks with approximately even-distributed data and equal misclassification cost. In real applications, most of pattern recognition tasks are unevenly-distributed, such as credit card fraud detection, network intrusion detection, medical diagnosis, etc. In this paper, a Data distribution-based Cost-sensitive-BLS (DDbCs-BLS) is proposed, aimed at solving the problem of pattern classification tasks with imbalance data and varying misclassification costs on different classes. DDbCs-BLS can achieve the best classification boundary by adopting the cost sensitive BLS learners, and ensure the lossless of the information of sparse classes, so as to ensure the classification performance of BLS classifier in various data sets. DDbCs-BLS was validated on multiple public data sets (including balanced and imbalanced data sets). Extensive validation and comparative results show that DDbCs-BLS can effectively determine the best location of the classification boundary line, consequently, it can achieve better classification performance on both balanced and imbalanced data sets.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-10-23
  • 最后修改日期:2021-01-25
  • 录用日期:2020-02-29
  • 在线发布日期:
  • 出版日期: