YANG You, FANG Xiaolong, DENG Yi, WU Chunyan, YAO Lu. Visual commonsense and attention for image captioning[J]. Microelectronics & Computer, 2022, 39(6): 51-59. DOI: 10.19304/J.ISSN1000-7180.2021.1226
Citation: YANG You, FANG Xiaolong, DENG Yi, WU Chunyan, YAO Lu. Visual commonsense and attention for image captioning[J]. Microelectronics & Computer, 2022, 39(6): 51-59. DOI: 10.19304/J.ISSN1000-7180.2021.1226

Visual commonsense and attention for image captioning

  • Image Captioning is to make the computer automatically generate the natural language description of a given image, it involves computer vision and natural language processing, and can be applied to retrieval systems, navigation for the blind and medical report generation. visual commonsense and attention for image captioning is proposed to address the problems that the existing image captioning models do not sufficiently mine the visual semantic relations and attention deviation exists in the modeling feature of multilevel attention mechanism. Under the framework of codec structure, visual commonsense is introduced in the encoding part to guide local features to generate commonsense semantic relations, Faster R-CNN and VC R-CNN were used to extract local features and visual commonsense features, and attention on attention is applied to the high-level semantics mined by multi-layer attention, which can enhance features and obtain better relevance and reduce attention deviation to mislead sequence generation at the decoding part. The attention mechanism is used to select relevant information weighted by features, and LSTM and Gated Linear Unit are used to generate the output sequence in the decoding part. The model was tested on MS COCO dataset, and the experimental results showed that BLEU、METEOR、ROUGE-L、CIDEr and SPICE were improved to some extent, which indicated that the model could express the semantic content of images more accurately and more richly.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return