一种基于多层语义特征的图像理解方法
作者:
作者单位:

哈尔滨工程大学

作者简介:

通讯作者:

中图分类号:

TP181

基金项目:

国家重点研发计划新一代人工智能重大专项2030(Nos. 2018AAA0102702)


A Image Understanding Method Based on Multi-level Semantic Features
Author:
Affiliation:

Harbin Engineering University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    视觉场景理解包括检测和识别物体,推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、准确的理解,我们将物体检测、视觉关系检测和图像描述视为场景理解中三种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型将这三种不同语义层进行相互连接共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新.更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并且引入融合注意力机制提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上比现有的方法拥有更好的性能.

    Abstract:

    Visual scene understanding includes detecting and recognizing objects, reasoning the visual relationships of the detected objects, and describing image regions with sentences. In order to achieve the more comprehensive and accurate understanding of scene image, we view object detection, visual relationship detection and image captioning as three visual tasks at different semantic levels in scene understanding, so as to propose an image understanding model based on multi-level semantic features to leverage the mutual connections across the three different semantic layers to solve the scene understanding tasks jointly. The model through a message pass graph to iterate and update the semantic features of object, relationship phrase and image captioning simultaneously. The updated semantic features are used to classify objects and visual relationships, generate scene graph and captions, and introduce a fusion attention mechanism to improve the accuracy of captions. The experimental results on the Visual Genome and COCO datasets show that the proposed method outperforms the existing methods on the scene graph generation and image captioning tasks.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-07-09
  • 最后修改日期:2020-09-22
  • 录用日期:2020-09-25
  • 在线发布日期:
  • 出版日期: