Computer Science > Computer Vision and Pattern Recognition

arXiv:2107.12270 (cs)

[Submitted on 26 Jul 2021 (v1), last revised 9 Aug 2021 (this version, v2)]

Title:Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Authors:Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang

View PDF

Abstract:Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies. The semantic coherence learning can further improve the alignment between vision and linguistics, and the coherence across a sequence of video segments. Experimental results show that our method significantly outperforms the baseline by a large margin.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2107.12270 [cs.CV]
	(or arXiv:2107.12270v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2107.12270

Submission history

From: Juncheng Li [view email]
[v1] Mon, 26 Jul 2021 15:23:19 UTC (20,668 KB)
[v2] Mon, 9 Aug 2021 08:50:13 UTC (13,923 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators