Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23734 (cs)

[Submitted on 31 Jul 2025]

Title:RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Authors:Dongming Wu, Yanping Fu, Saike Huang, Yingfei Liu, Fan Jia, Nian Liu, Feng Dai, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jianbing Shen

View PDF HTML (experimental)

Abstract:General robotic grasping systems require accurate object affordance perception in diverse open-world scenarios following human instructions. However, current studies suffer from the problem of lacking reasoning-based large-scale affordance prediction data, leading to considerable concern about open-world effectiveness. To address this limitation, we build a large-scale grasping-oriented affordance segmentation benchmark with human-like instructions, named RAGNet. It contains 273k images, 180 categories, and 26k reasoning instructions. The images cover diverse embodied data domains, such as wild, robot, ego-centric, and even simulation data. They are carefully annotated with an affordance map, while the difficulty of language instructions is largely increased by removing their category name and only providing functional descriptions. Furthermore, we propose a comprehensive affordance-based grasping framework, named AffordanceNet, which consists of a VLM pre-trained on our massive affordance data and a grasping network that conditions an affordance map to grasp the target. Extensive experiments on affordance segmentation benchmarks and real-robot manipulation tasks show that our model has a powerful open-world generalization ability. Our data and code is available at this https URL.

Comments:	Accepted by ICCV 2025. The code is at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2507.23734 [cs.CV]
	(or arXiv:2507.23734v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23734

Submission history

From: Dongming Wu [view email]
[v1] Thu, 31 Jul 2025 17:17:05 UTC (12,255 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators