张培健, 滕奇志, 何小海. 基于级联卷积神经网络的轻量级视频插帧算法[J]. 微电子学与计算机, 2021, 38(3): 39-45.
引用本文: 张培健, 滕奇志, 何小海. 基于级联卷积神经网络的轻量级视频插帧算法[J]. 微电子学与计算机, 2021, 38(3): 39-45.
ZHANG Pei-jian, TENG Qi-zhi, HE Xiao-hai. Lightweight cascade network for video frame interpolation[J]. Microelectronics & Computer, 2021, 38(3): 39-45.
Citation: ZHANG Pei-jian, TENG Qi-zhi, HE Xiao-hai. Lightweight cascade network for video frame interpolation[J]. Microelectronics & Computer, 2021, 38(3): 39-45.

基于级联卷积神经网络的轻量级视频插帧算法

Lightweight cascade network for video frame interpolation

  • 摘要: 针对基于卷积神经网络的视频插帧算法模型参数过大、实时性差、内存占用高、难以广泛应用的问题,提出了一种基于双向光流和多尺度特征融合的轻量级级联推理网络模型.将视频插帧任务分解为帧间运动合成和纹理重建两个步骤,设计了一种轻量级的级联双向光流预测网络,并提出了一种多尺度空间与纹理特征融合网络模型,实现了对视频帧多尺度纹理特征和复杂运动特征的充分提取和利用.该模型使用相邻的两个视频帧和所需中间帧的位置作为网络输入,首先计算两个输入帧的空间金字塔特征与纹理金字塔特征;然后使用空间金字塔特征计算帧间的多尺度双向光流;随后结合双向光流,计算出中间帧的空间特征和纹理特征;最后融合中间帧的多尺度空间、纹理特征得到最终所需的视频帧.在Vimeo90K和UCF101数据集上的实验表明,在保证精度的前提下,本文算法在计算速度和模型参数量上具有更好的表现.

     

    Abstract: Aiming at the problems that the model parameters of video interpolation algorithm based on convolutional neural network are too large, poor real-time, high memory occupation and difficult to be widely used, a lightweight cascading inference based on bidirectional optical flow and multi-scale feature fusion is proposed. The network model decomposes the task of video frame insertion into two steps of inter-frame motion synthesis and texture reconstruction, designs a lightweight two-way optical flow prediction network, and proposes a multi-scale spatial and texture feature fusion network model. The multi-scale texture features and complex motion features of video frames are fully extracted and utilized. The model uses the position of two adjacent video frames and the required intermediate frames as network inputs. First, the spatial pyramid features and texture pyramid features of the two input frames are calculated. Then the spatial pyramid features are used to calculate the multi-scale bidirectional optical flow between frames, meanwhile, calculating the spatial and texture features of the intermediate frame. Finally, a fusing network is introduced to generate the multi-scale space and texture features of the intermediate frame to generate the final required video frame. Experiments on the Vimeo90K and UCF101 datasets show that, under the premise of guaranteeing accuracy, the algorithm in this paper has better performance in terms of calculation speed and model parameters.

     

/

返回文章
返回