Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23785 (cs)

[Submitted on 31 Jul 2025]

Title:Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Authors:Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, Baining Guo

Abstract:In this paper, we present a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion. We address these challenges by introducing a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussian Splats (GS) and their temporal variations from 3D animation data without per-instance fitting, and compresses high-dimensional animations into a compact latent space. Building upon this efficient representation, we train a Gaussian Variation Field diffusion model with temporal-aware Diffusion Transformer conditioned on input videos and canonical GS. Trained on carefully-curated animatable 3D objects from the Objaverse dataset, our model demonstrates superior generation quality compared to existing methods. It also exhibits remarkable generalization to in-the-wild video inputs despite being trained exclusively on synthetic data, paving the way for generating high-quality animated 3D content. Project page: this https URL.

Comments:	ICCV 2025. Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23785 [cs.CV]
	(or arXiv:2507.23785v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23785

Submission history

From: Bowen Zhang [view email]
[v1] Thu, 31 Jul 2025 17:59:51 UTC (14,621 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators