面向自然场景的中文文本检测

贾颖; 程艳云

doi:10.19304/J.ISSN1000-7180.2021.0897

面向自然场景的中文文本检测

贾颖,
程艳云

Chinese text detection for natural scenes

摘要

摘要: 随着深度学习的发展，自然场景的文本检测取得了进步，但多方向和弯曲中文文本检测效果仍不理想.针对多方向和弯曲中文文本的检测问题，提出一种融合注意力机制的多尺度文本检测方法.为了平衡模型准确性和降低计算复杂度，采用轻量级Resnet18为主干网络.针对特征金字塔(FPN)提取的特征分布不确定性的问题，嵌入平衡注意力机制(BAM)提取有效文本特征并抑制低效特征通道，进而提升检测方法的鲁棒性.针对空洞空间金字塔池化网络(ASPP)下采样时图像局部信息和细节信息丢失的问题，改进ASPP以降低特征图分辨率的损失.针对FPN提取特征不足以及感受野小的问题，将嵌入注意力机制的FPN和改进的ASPP并行增强特征提取融合.针对正负样本的不平衡性的问题，基于可微二值化模块在二值图损失中引入对数化的AC Loss，从而增强检测模型的泛化能力.在公开数据集MSRA-TD500上的实验结果表明，该算法与目前快速高效的DBnet相比，准确率、召回率和F值分别提升0.1%、1.4%和0.6%，并且该算法的检测速率也有较好表现.

Abstract: With the development of deep learning, the text detection of natural scenes has made progress, but the detection effect of multi-directional and curved Chinese text is still not ideal. A multi-scale text detection method isproposed, whichintegrates attention mechanism for the detection of multi-directional and curved Chinese text. In order to balance model accuracy and reduce computational complexity, a lightweight Resnet18 backbone network is adopted. Aiming at the problem of the uncertainty of the feature distribution extracted by the feature pyramid (FPN), the embedded balanced attention mechanism (BAM) extracts effective text features and suppresses inefficient feature channels, thereby improving the robustness of the detection method. Aiming at the problem of the loss of image local information and detail information during downsampling of the Hollow Space Pyramid Pooling Network (ASPP), ASPP is improved to reduce the loss of feature map resolution. Aiming at the problem of insufficient FPN feature extraction and small perception field, the FPN embedded in the attention mechanism and the improved ASPP parallel enhanced feature extraction are fused. Aiming at the problem of the imbalance of positive and negative samples, the logarithmic AC Loss is introduced into the binary graph loss based on the differentiable binarization module, thereby enhancing the generalization ability of the detection model. The experimental results on the public data set MSRA-TD500 show that compared with the current fast and efficient DBnet, the accuracy, recall and F value of this algorithm are increased by 0.1%, 1.4% and 0.6% respectively, and the detection rate of this algorithm is also has a good performance.

HTML全文

参考文献(16)

施引文献

资源附件(0)