1.Chongqing Jiaotong university;2.Chongqing Jiaotong University
National Natural Science Foundation of China
针对蓄意攻击样本有限不均衡而引起无法有效识别关键危险源少数类样本的问题，文中提出多分类器集成加权均衡分布适配的关键危险源识别方法。首先在保证少数类样本被充分选择的前提下随机抽取多数类样本构成源域多样本训练集，在目标域上直接预测伪标签并给样本赋不同的权重，让少数类样本得到充分的训练；然后，训练源域样本集的分类器，经过多次迭代优化目标域伪标签并更新权重矩阵；最后，通过多分类器集成的策略将筛选出的基分类器集成强分类器，采用宏平均和微平均评价指标评价分类器的识别性能。利用全球恐怖主义数据库（Global Terrorism Database,GTD）数据进行实验验证，证明所提方法在保证了整体精度的同时能有效识别少数类样本。
In order to solve the problem that samples of minority class of critical risk sources cannot be effectively identified due to the deliberate attack samples finite unbalance, a multi-classifier ensemble weighted balanced distribution adaptive method for critical risk sources identification is proposed. Firstly, ensure that the minority samples are fully selected, the source domain multi sample training set is obtained by random sampling, and different initial weights are given to the samples to fully train the minority samples. Then, the classifier of the sample set in the source domain is trained, and the pseudo label of the target domain is optimized and the weight matrix is updated after many iterations. Finally, the selected base classifiers are integrated into strong classifiers through the strategy of multi classifier integration, and the recognition performance of classifiers is evaluated by macro average and micro average evaluation indexes. The global terrorism database (GTD) data is used to verify the proposed method, which can effectively identify a small number of samples while ensuring the overall accuracy.