许伟濠,张伯泉,刘银萍.基于热力图和注意力机制的单目6D姿态估计算法[J]. 微电子学与计算机,2023,40(7):45-54. doi: 10.19304/J.ISSN1000-7180.2022.0538
引用本文: 许伟濠,张伯泉,刘银萍.基于热力图和注意力机制的单目6D姿态估计算法[J]. 微电子学与计算机,2023,40(7):45-54. doi: 10.19304/J.ISSN1000-7180.2022.0538
XU W H,ZHANG B Q,LIU Y P. Monocular 6D pose estimation algorithm based on heatmap and attention mechanism[J]. Microelectronics & Computer,2023,40(7):45-54. doi: 10.19304/J.ISSN1000-7180.2022.0538
Citation: XU W H,ZHANG B Q,LIU Y P. Monocular 6D pose estimation algorithm based on heatmap and attention mechanism[J]. Microelectronics & Computer,2023,40(7):45-54. doi: 10.19304/J.ISSN1000-7180.2022.0538

基于热力图和注意力机制的单目6D姿态估计算法

Monocular 6D pose estimation algorithm based on heatmap and attention mechanism

  • 摘要: 基于二阶段坐标解耦的单目6D姿态估计方法具有稳定、高效和训练速度快的特点,但在精度上还存在改进空间. 提出了一种利用高斯热力图坐标回归和融合注意力的单目6D姿态估计算法. 该算法在ResNet34骨干网络中引入融合注意力模块,使网络能够更好地学习物体的表面特征和空间信息;基于可微分空间坐标变换对平移量计算网络进行改进,能够更准确地预测坐标平移量. 该算法使用基于密度层次化的聚类方法,建立点云的哈希索引,对所预测的3D点云进行约束,同时有效减少离群的3D采样点. 在训练阶段,该算法使用合成渲染图像对LineMod数据集进行扩展,为网络训练提供丰富数据. 实验结果表明,该方法的ADD(-S)指标和2D投影误差指标分别达到了93.27%和98.81%,相比基准方法CDPN分别提高了3.41%和0.79%,与PVNet和DPOD等比较新颖的算法对比显示出综合优越性.

     

    Abstract: The monocular 6D pose estimation method based on two-stage coordinate decoupling has the characteristics of stability, high efficiency and fast training speed, but there is still room for improvement in accuracy. A monocular 6D pose estimation algorithm using Gaussian heatmap coordinate regression and fusion attention is proposed. It introduces the fusion attention module into resnet34 backbone network, so that the network can better learn the surface features and spatial information of objects. Based on the differential space coordinate transformation, the translation calculation network is improved to predict the coordinate translation more accurately. The algorithm uses a clustering method based on density hierarchy and establishes a hash point cloud index, constrain the predicted 3D point cloud, and effectively reduce the outliers of 3D sampling points. In the training phase, the algorithm uses the synthetic rendered image to expand the linemod data set and provide rich data for network training. The experimental results show that the ADD(-S) index and 2D projection error index of the method reach 93.27% and 98.81% respectively. Compared with the benchmark algorithm CDPN, it is improved by 3.41% and 0.79% respectively. Compared with the relatively novel algorithms such as PVNet and DPOD, it shows comprehensive advantages.

     

/

返回文章
返回