国家自然科学基金(61871258), NSFC-新疆联合基金重点项目(U1703261), 国家重点研发计划(2016YFB0800403)资助
China Three Gorges University, College of Computer and Information Technology
Supported by National Natural Science Foundation of China (U1703261,61871258) and National Key Research and Development Project (2016YFB0800403.)
循环神经网络(Recurrent Neural Networks, RNN)是一种以序列数据为输入,在序列的演进方向进行递归的人工神经网络.由于其高识别精度,被广泛用于自然语言处理、语音识别等序列信号的识别中.但随着网络层数的加深,传统循环神经网络极易产生梯度消失问题,且由于并行化计算能力较弱,导致网络训练速度缓慢.本文基于可以并行化计算的简单循环单元(Simple Recurrent Unit, SRU)网络,引入高速公路网络(Highway-Networks)的连接思想,提出高速简单循环单元(H-SRU)网络:一方面利用非饱和激活函数可以有效缓解梯度消失的性质,将原有SRU结构里单元状态和隐状态的激活函数替换为非饱和激活函数;另一方面在SRU的单元状态表示中引入高速公路网络采用的前馈链接思想,使网络对梯度变化更敏感.基于PTB(Penn Treebank Dataset)和WikiText-2两个数据集的语言模型构建,以验证所提方法的有效性.实验结果表明,所设计的高速简单循环单元网络H-SRU在保持SRU原有训练速度优势的同时,较大地提高了网络的性能.在WikiText-2数据集上我们方法的困惑度PPL值达到了26.1,这是目前已知最好效果,而且其效率也比已知的非SRU网络高.
Recurrent Neural Networks (RNN) are artificial neural networks that takes sequence data as input and recursively in the iteration direction of the sequence. It is widely used in natural language processing, speech recognition, and other sequence signal recognition due to its high recognition accuracy. However, as the number of network layers increases, conventional Recurrent Neural Networks (RNN) are prone to gradient disappearance problems, leading to reduced accuracy. Additionally, the network training speed becomes slow due to poor parallelization computing capability. In this paper, leveraging parallelization capability of Simple Recurrent Unit (SRU) network and connection strategy of Highway-Networks, we proposed a Highway-Simple Recurrent Unit (H-SRU). One is to replace the activation function of the cell state with the non-saturated activation function to effectively solve the vanishing gradient problems. The other is to introduce the idea of feed-forward link used in Highway-Networks into the cell state representation of the SRU to make the network more sensitive to gradient changes. We built natural language processing models to verify the effectiveness of the proposed method using PTB (Penn Treebank Dataset) and WikiText-2 data sets. The results indicate that H-SRU proposed in this paper dramatically improves the performance of recognition, while maintaining high training speed of SRU. The perplexity, value of H-SRU on the WikiText-2 data set reaches 26.1, which is currently the best known, and its efficiency is higher than that of non-SRU networks.