Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23567 (cs)

[Submitted on 31 Jul 2025]

Title:3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Authors:Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu, Siyuan Li, Rui Huang, Yuqian Fu, Marc Pollefeys, Hermann Blum, Zuria Bauer

View PDF HTML (experimental)

Abstract:Monocular 3D object detection is valuable for various applications such as robotics and AR/VR. Existing methods are confined to closed-set settings, where the training and testing sets consist of the same scenes and/or object categories. However, real-world applications often introduce new environments and novel object categories, posing a challenge to these methods. In this paper, we address monocular 3D object detection in an open-set setting and introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD). We propose to lift the open-set 2D detection into 3D space through our designed 3D bounding box head, enabling end-to-end joint training for both 2D and 3D tasks to yield better overall performance. We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes. To further improve performance, we design the canonical image space for more efficient cross-dataset training. We evaluate 3D-MOOD on both closed-set settings (Omni3D) and open-set settings (Omni3D to Argoverse 2, ScanNet), and achieve new state-of-the-art results. Code and models are available at this http URL.

Comments:	ICCV 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23567 [cs.CV]
	(or arXiv:2507.23567v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23567

Submission history

From: Yung-Hsu Yang [view email]
[v1] Thu, 31 Jul 2025 13:56:41 UTC (18,293 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators