引用本文:李旭,姚春龙,范丰龙,等.结合注意力机制的循环神经网络复述识别模型[J].控制与决策,2021,36(1):152-158
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  【EndNote】   【RefMan】   【BibTex】
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览次   下载 本文二维码信息
码上扫一扫!
分享到: 微信 更多
结合注意力机制的循环神经网络复述识别模型
李旭,姚春龙,范丰龙,于晓强
(大连工业大学信息科学与工程学院,辽宁大连116034)
摘要:
传统基于深度学习的复述识别模型通常以关注文本表示为核心,忽略了对多粒度交互特征的挖掘与匹配.为此,建模文本交互空间,分别利用双向长短时记忆网络对两个候选复述句按条件编码,基于迭代隐状态的输出,通过逐词软对齐的方式从词、短语、句子等多个粒度层次推理并获取句子对的语义表示,最后综合不同视角的语义表达利用softmax实现二元分类.为解决复述标注训练语料不足,在超过580000句子对的数据集上利用语言建模任务对模型参数无监督预训练,再使用预训练好的参数在标准数据集上有监督微调.与先前最佳的神经网络模型相比,所提出模型在标准数据集MSRP上准确率提高2.96%,$F_1$值改善2%.所提出模型综合文本全局和局部匹配信息,多粒度、多视角地描述文本交互匹配模式,能够降低对人工特征工程的需求,具有良好的实用性.
关键词:  自然语言处理  复述识别  循环神经网络  双向长短时记忆  注意力机制  无监督预训练
DOI:10.13195/j.kzyjc.2019.0638
分类号:TP18
基金项目:国家重点研发计划专项项目(2017YFC0821003-3);辽宁省高等学校基本科研项目(2017J049);辽宁省自然科学基金项目(20180550395);辽宁省教育厅青年科技人才“育苗”项目(J2020113).
Recurrent neural networks based paraphrase identification model combined with attention mechanism
LI Xu,YAO Chun-long,FAN Feng-long,YU Xiao-qiang
(School of Information Science and Engineering,Dalian Polytechnic University,Dalian116034,China)
Abstract:
The traditional paraphrase identification models based on deep learning usually focus on text representation and ignore the mining and matching of multi-granular interaction features. To address the problem, we propose a recurrent neural network model with word-by-word attention mechanism. In this paper, the word embeddings are inputted into the recurrent neural networks, and the two candidate paraphrase sentences are conditionally encoded via two bidirectional. Based on the output of the iterative hidden states, the sentence-pair representation is obtained from global matching and fine-grained reason via soft-alignment of words and words in the two sentences. Finally, for classification, we use a softmax layer over the output of a non-linear projection of the output vector into the target space of the two classes. The labeled training set for paraphrase identification is small in comparison with the high complexity of the task. In order to make full use of the training data, we use a language modeling task to unsupervised pre-train the neural network parameters on the corpora of more than 580,000 pairs of sentences. This is followed by a fine-tuning stage, where we adapt the model to a specific task with labeled data. Compared with the previous state-of-art neural network model, the accuracy and the $F_1$ score of our model are improved by 2.96 percent and 2 percent on the MSRP data set respectively. The proposed model combines multiple semantic expressions of text from different perspectives and describes the multi-granular matching pattern. It is an end-to-end differentiable system that reduces manual feature engineering efforts, and has good practicability.
Key words:  natural language processing  paraphrase identification  recurrent neural networks  bidirectional long short-term memory  attention mechanism  unsupervised pre-training

用微信扫一扫

用微信扫一扫