刘畅,宣士斌,何雪东,等.基于增强多尺度特征解码器的图像语义分割[J]. 微电子学与计算机,2023,40(4):30-37. doi: 10.19304/J.ISSN1000-7180.2022.0458
引用本文: 刘畅,宣士斌,何雪东,等.基于增强多尺度特征解码器的图像语义分割[J]. 微电子学与计算机,2023,40(4):30-37. doi: 10.19304/J.ISSN1000-7180.2022.0458
LIU C,XUAN S B,HE X D,et al. Semantic image segmentation based on enhanced multi-scale feature decoder[J]. Microelectronics & Computer,2023,40(4):30-37. doi: 10.19304/J.ISSN1000-7180.2022.0458
Citation: LIU C,XUAN S B,HE X D,et al. Semantic image segmentation based on enhanced multi-scale feature decoder[J]. Microelectronics & Computer,2023,40(4):30-37. doi: 10.19304/J.ISSN1000-7180.2022.0458

基于增强多尺度特征解码器的图像语义分割

Semantic image segmentation based on enhanced multi-scale feature decoder

  • 摘要: 针对语义分割模型SegFormer在进行图像分割时存在多尺度语义信息利用不充分、细节特征丢失等问题,提出了一种改进的轻量级的语义分割算法,并设计了一个新的解码器来增强多尺度特征表示. 采用新提出的瓶颈空间金字塔池化模块(BoSPP)以获得丰富且准确的多尺度信息,所提出模型采用拉普拉斯金字塔来获得编码阶段更精确的高分辨率细节特征,并将其应用于解码阶段来解决细节特征丢失的问题;最后对特征进行逐步融合,以避免上采样率过大导致细节损失,极大地保留丰富的细节特征进而增强最终的语义分割效果. ADE20K数据集的实验结果表明,使用改进后的解码器进行语义分割,在精度和运算量方面都有所改善. 以使用MiT-B0编码器的实验为例,其mIoU指标相比原网络提升了1.36%,浮点运算量仅为原网络的51%. 实验结果表明,改进后的模型在不增加大量计算成本的情况下提升了模型的分割精度,且浮点运算量更少,改进后的语义分割模型优于原模型,在增强多尺度特征和图像边界细节特征方面有更好的分割效果.

     

    Abstract: Aiming at the problems of insufficient utilization of multi-scale information and losses of detailed features in the semantic segmentation model SegFormer, an improved lightweight semantic segmentation algorithm is proposed, and a novel decoder is designed to enhance multi-scale semantic feature representation. A novel bottleneck with spatial pyramid pooling is adopted to obtain more accurate multi-scale information; and a Laplacian Pyramid is used to obtain high-resolution detail features in the encoding stage, and it is applied to the decoding stage to solve the problem of loss of details. Finally, the features are progressively fused in to avoid the loss of details caused by the excessive upsampling rate, and greatly retain richer details to enhance the final semantic segmentation effect. The experimental results based on the ADE20K dataset show that using the improved decoder for semantic segmentation improves both the accuracy and the reduction of computation. Taking the experiment using the MiT-B0 encoder as an example, its mIoU index is 1.36% higher than that of the original network, and the amount of floating-point operations is only 51% of the original network. According to the above experimental results, the proposed model can improve the segmentation accuracy of the model without increasing a large amount of computational cost, and the amount of floating-point operations is reduced, which proves that the enhanced semantic segmentation model is better than the original model. It has better effect on multi-scale feature representation and boundary details of feature.

     

/

返回文章
返回