Learning to Generate and Manipulate 3D Radiance Field by a Hierarchical Diffusion Framework with CLIP Latent
dc.contributor.author | Wang, Jiaxu | en_US |
dc.contributor.author | Zhang, Ziyi | en_US |
dc.contributor.author | Xu, Renjing | en_US |
dc.contributor.editor | Chaine, Raphaëlle | en_US |
dc.contributor.editor | Deng, Zhigang | en_US |
dc.contributor.editor | Kim, Min H. | en_US |
dc.date.accessioned | 2023-10-09T07:34:01Z | |
dc.date.available | 2023-10-09T07:34:01Z | |
dc.date.issued | 2023 | |
dc.description.abstract | 3D-aware generative adversarial networks (GAN) are widely adopted in generating and editing neural radiance fields (NeRF). However, these methods still suffer from GAN-related issues including degraded diversity and training instability. Moreover, 3D-aware GANs consider NeRF pipeline as regularizers and do not directly operate with 3D assets, leading to imperfect 3D consistencies. Besides, the independent changes in disentangled editing cannot be ensured due to the sharing of some shallow hidden features in generators. To address these challenges, we propose the first purely diffusion-based three-stage framework for generative and editing tasks, with a series of well-designed loss functions that can directly handle 3D models. In addition, we present a generalizable neural point field as our 3D representation, which explicitly disentangles geometry and appearance in feature spaces. For 3D data conversion, it simplifies the preparation pipeline of datasets. Assisted by the representation, our diffusion model can separately manipulate the shape and appearance in a hierarchical manner by image/text prompts that are provided by the CLIP encoder. Moreover, it can generate new samples by adding a simple generative head. Experiments show that our approach outperforms the SOTA work in the generative tasks of direct generation of 3D representations and novel image synthesis, and completely disentangles the manipulation of shape and appearance with correct semantic correspondence in the editing tasks. | en_US |
dc.description.number | 7 | |
dc.description.sectionheaders | Neural Rendering | |
dc.description.seriesinformation | Computer Graphics Forum | |
dc.description.volume | 42 | |
dc.identifier.doi | 10.1111/cgf.14930 | |
dc.identifier.issn | 1467-8659 | |
dc.identifier.pages | 13 pages | |
dc.identifier.uri | https://doi.org/10.1111/cgf.14930 | |
dc.identifier.uri | https://diglib.eg.org:443/handle/10.1111/cgf14930 | |
dc.publisher | The Eurographics Association and John Wiley & Sons Ltd. | en_US |
dc.subject | CCS Concepts: Computing methodologies -> Shape modeling; Image manipulation | |
dc.subject | Computing methodologies | |
dc.subject | Shape modeling | |
dc.subject | Image manipulation | |
dc.title | Learning to Generate and Manipulate 3D Radiance Field by a Hierarchical Diffusion Framework with CLIP Latent | en_US |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- v42i7_02_14930.pdf
- Size:
- 3.27 MB
- Format:
- Adobe Portable Document Format