3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

Anonymous Authors

Aliquam vitae elit ullamcorper tellus egestas pellentesque. Ut lacus tellus, maximus vel lectus at, placerat pretium mi. Maecenas dignissim tincidunt vestibulum. Sed consequat hendrerit nisl ut maximus.

Abstract

Text-driven 3D scene generation techniques have made rapid progress in recent years, benefiting from the development of diffusion models. Their success is mainly attributed to using existing diffusion models to iteratively perform image warping and inpainting to generate 3D scenes. However, these methods heavily rely on the outputs of existing models, leading to an error accumulation in geometry and appearance that prevent the models from being used in various scenarios (e.g., outdoor and unreal scenarios). To address this limitation, we generatively refine the newly generated local views by querying and aggregating global 3D information, and then progressively generate the 3D scene. To this end, we first employ a tri-plane features-based NeRF as a unified representation of the 3D scene to constrain global consistency. After that, we propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior of the 2D diffusion model as well as the global 3D representation information of the current scene. Extensive experiments demonstrate that compared to previous methods, our approach supports more varieties of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency. Codes will be released.

MY ALT TEXT

Overview of our pipeline. (a) Scene Context Initialization contains a supporting database to provide novel viewpoint data for progressive generation. (b) Unified 3D Representation provides a unified representation for the generated scene, which allows our approach to accomplish diverse scene generation and to hold the 3D consistency at the same time. (c) 3D-Aware Generative Refinement alleviates the cumulative error issue during long-term extrapolation by exploiting large-scale natural images prior to generatively refine the synthesized novel viewpoint image. The consistency regularization module is used for test-time optimization.

MY ALT TEXT

First image description.

MY ALT TEXT

First image description.

MY ALT TEXT

First image description.

MY ALT TEXT

First image description.

MY ALT TEXT

First image description.

Poster

BibTeX

BibTex Code Here