The National Natural Science Foundation of China (General Program, Key Program, Major Research Plan)
针对法律判决预测中罪名预测和法条推荐子任务, 提出了基于BERT ( Bidirectional Encoder Representation from Transformers ) 预训练模型与知识蒸馏策略的多任务多标签文本分类模型. 为挖掘子任务间的关联, 提高预测准确率, 运用BERT预训练模型进行多任务学习, 建立了BERT12multi文本分类模型; 针对罪名、法条类别中的样本不均衡问题, 采用分组的焦点损失( Focal Loss ) 以增强模型对于罕见罪名及法条的辨别能力; 为降低模型计算复杂度并且提高模型推理速度, 提出了一种以教师模型评价为参考的知识蒸馏策略, 通过动态平衡蒸馏中的蒸馏损失和分类损失, 将BERT12multi压缩为浅层结构的学生模型. 综上, 构建出可以处理不均衡样本且具有较高推理速度的多任务多标签文本分类模型BERT6multi. 在CAIL2018数据集上的实验表明, 采用预训练模型及分组Focal Loss可显著提高法律判决预测的性能; 通过融入教师模型评价, 知识蒸馏得到的学生模型推理速度提高近一倍, 并且在罪名预测及法条推荐任务中获得86.7% 与83.0% 的F1-Score ( Micro-F1与Macro-F1的均值) .
Based on the BERT pre-training model and knowledge distillation, a multi-task and multi-label text classification model is proposed for two sub-tasks of the legal judgment prediction, namely, charge prediction and law article recommendation. To find the correlation between two sub-tasks and improve the performance of prediction, a text classification model named BERT12multi is formulated by multi-task learning based on a BERT pre-training model. The hierarchical Focal Loss is introduced to improve the ability of distinguishing the charges and law articles, which are sampled imbalanced. In order to reduce the computing complexity and increase the speed of the inference, we propose a knowledge distillation strategy based on the evaluation of the teacher model. The strategy compresses BERT12multi into a student model with a shallow structure by balancing between the classification loss and the distillation loss dynamically. Hence, a multi-task and multi-label text classification model with higher inference speed named BERT6multi is introduced, which can deal with the imbalance problem of samples. Experiments on the CAIL2018 dataset show that the pre-training model and hierarchical Focal Loss can improve the performance of the prediction algorithm effectively. Combined with our knowledge distillation strategy, the inference speed of the student model is nearly doubled. The F1-Scores (mean value of Micro-F1 and Macro-F1) for charge prediction and law article recommendation are 86.7% and 83.0%.