Harbin University of Science and Technology
针对自然场景中文字符检测率低,小字符检测困难以及字符检测类别多样等问题,本文提出了一种基于YOLOv2的改进方法,并将其应用于自然场景中文字符检测中.首先利用K-means++聚类算法对字符目标候选框(anchor)的数量和宽高比维度进行聚类分析,并增加anchor数量,选择6个大小不同的anchor,以筛选出更加适合于字符检测的候选框.然后提出多层特征融合策略,对原网络中第4个最大池化层前所输出的特征图经过3×3和1×1大小的卷积核进行卷积操作并执行4倍的下采样得到局部特征,再对第5个最大池化层前所输出的特征图经过3×3和1×1大小的卷积核进行卷积操作并执行2倍的下采样得到局部特征,将局部特征与全局特征融合,增强网络对局部特征的提取,以提高网络对小字符目标的检测精度.同时增加高层卷积中的重复卷积层,将高层卷积中连续且重复的3×3×1024大小的卷积层数由3增加为5,以提高字符检测类别.最后使用Chinese Text in the Wild(CTW)数据集对YOLOv2和改进的YOLOv2算法进行对比实验,实验结果表明,改进后的YOLOv2算法在中文字符检测中平均准确率均值(Mean Average Precision,mAP)为78.3%,较原YOLOv2算法mAP值提升了7.3%,且明显高于其它自然场景中文字符检测方法.
This paper proposes an improved method based on YOLOv2 to solve the problems of low Chinese character detection rate, difficulty in small character detection and various character detection categories in natural scenes, and applies it to Chinese character detection in natural scenes. Firstly, k-means++ clustering algorithm is used to cluster the number of character target candidate box (anchor) and width-height ratio, increase the number of anchor, select 6 anchor with different sizes, and select candidate box more suitable for character detection.Then multi-layer feature fusion strategy is put forward. In the original network, outputing before the fourth maxpooling layer feature map after 3×3 and 1×1 size of convolution kernels for convolution operation and carry out 4 times under the sample to get the local feature, and outputing before the fifth maxpooling layer feature map after 3×3 and 1×1 size of convolution kernels for convolution operation and execute sampling of 2 times to get the local feature, to incorporate global features and local features, strengthen the network of local feature extraction, in order to improve the network of small target detection precision of the characters. At the same time, repeat convolution layer in high-level convolution is added, and the number of continuous and repeated 3×3×1024 convolution layers in high-level convolution is increased from 3 to 5, so as to improve the character detection category. Finally the use of Chinese Text in the Wild (CTW) data set of YOLOv2 and improved YOLOv2 algorithm contrast experiment, the experimental results show that the improved YOLOv2 algorithm measured in Chinese characters to the average precision mean (Mean Average Precision, mAP) was 78.3%, compared with the original YOLOv2 algorithm mAP value increased by 7.3%, and Chinese characters are significantly higher than other natural scene detection method.□□□□□□□□□□□□□□