Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23543 (cs)

[Submitted on 31 Jul 2025]

Title:ART: Adaptive Relation Tuning for Generalized Relation Prediction

Authors:Gopika Sudhakaran, Hikaru Shindo, Patrick Schramowski, Simone Schaub-Meyer, Kristian Kersting, Stefan Roth

Abstract:Visual relation detection (VRD) is the task of identifying the relationships between objects in a scene. VRD models trained solely on relation detection data struggle to generalize beyond the relations on which they are trained. While prompt tuning has been used to adapt vision-language models (VLMs) for VRD, it uses handcrafted prompts and struggles with novel or complex relations. We argue that instruction tuning offers a more effective solution by fine-tuning VLMs on diverse instructional data. We thus introduce ART, an Adaptive Relation Tuning framework that adapts VLMs for VRD through instruction tuning and strategic instance selection. By converting VRD datasets into an instruction tuning format and employing an adaptive sampling algorithm, ART directs the VLM to focus on informative relations while maintaining generalizability. Specifically, we focus on the relation classification, where subject-object boxes are given and the model predicts the predicate between them. We tune on a held-in set and evaluate across multiple held-out datasets of varying complexity. Our approach strongly improves over its baselines and can infer unseen relation concepts, a capability absent in mainstream VRD methods. We demonstrate ART's practical value by using the predicted relations for segmenting complex scenes.

Comments:	Accepted for publication in ICCV 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2507.23543 [cs.CV]
	(or arXiv:2507.23543v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23543

Submission history

From: Gopika Sudhakaran [view email]
[v1] Thu, 31 Jul 2025 13:34:06 UTC (2,400 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ART: Adaptive Relation Tuning for Generalized Relation Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ART: Adaptive Relation Tuning for Generalized Relation Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators