Browsing by Author "Theobalt, Christian"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Advances in Neural Rendering(The Eurographics Association and John Wiley & Sons Ltd., 2022) Tewari, Ayush; Thies, Justus; Mildenhall, Ben; Srinivasan, Pratul; Tretschk, Edith; Wang, Yifan; Lassner, Christoph; Sitzmann, Vincent; Martin-Brualla, Ricardo; Lombardi, Stephen; Simon, Tomas; Theobalt, Christian; Nießner, Matthias; Barron, Jon T.; Wetzstein, Gordon; Zollhöfer, Michael; Golyanik, Vladislav; Meneveaux, Daniel; Patanè, GiuseppeSynthesizing photo-realistic images and videos is at the heart of computer graphics and has been the focus of decades of research. Traditionally, synthetic images of a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations of geometry and material properties as input. Collectively, these inputs define the actual scene and what is rendered, and are referred to as the scene representation (where a scene consists of one or more objects). Example scene representations are triangle meshes with accompanied textures (e.g., created by an artist), point clouds (e.g., from a depth sensor), volumetric grids (e.g., from a CT scan), or implicit surface functions (e.g., truncated signed distance fields). The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering. Neural rendering is closely related, and combines ideas from classical computer graphics and machine learning to create algorithms for synthesizing images from real-world observations. Neural rendering is a leap forward towards the goal of synthesizing photo-realistic image and video content. In recent years, we have seen immense progress in this field through hundreds of publications that show different ways to inject learnable components into the rendering pipeline. This state-of-the-art report on advances in neural rendering focuses on methods that combine classical rendering principles with learned 3D scene representations, often now referred to as neural scene representations. A key advantage of these methods is that they are 3D-consistent by design, enabling applications such as novel viewpoint synthesis of a captured scene. In addition to methods that handle static scenes, we cover neural scene representations for modeling nonrigidly deforming objects and scene editing and composition. While most of these approaches are scene-specific, we also discuss techniques that generalize across object classes and can be used for generative tasks. In addition to reviewing these state-ofthe- art methods, we provide an overview of fundamental concepts and definitions used in the current literature. We conclude with a discussion on open challenges and social implications.Item EUROGRAPHICS 2018: CGF 37-2 STARs Frontmatter(The Eurographics Association and John Wiley & Sons Ltd., 2018) Hildebrandt, Klaus; Theobalt, Christian; Hildebrandt, Klaus; Theobalt, Christian-Item EUROGRAPHICS 2023: CGF 42-2 STARs Frontmatter(The Eurographics Association and John Wiley & Sons Ltd., 2023) Bousseau, Adrien; Theobalt, Christian; Bousseau, Adrien; Theobalt, ChristianItem HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow(The Eurographics Association, 2022) Wang, Jiayi; Luvizon, Diogo; Mueller, Franziska; Bernard, Florian; Kortylewski, Adam; Casas, Dan; Theobalt, Christian; Bender, Jan; Botsch, Mario; Keim, Daniel A.Reconstructing two-hand interactions from a single image is a challenging problem due to ambiguities that stem from projective geometry and heavy occlusions. Existing methods are designed to estimate only a single pose, despite the fact that there exist other valid reconstructions that fit the image evidence equally well. In this paper we propose to address this issue by explicitly modeling the distribution of plausible reconstructions in a conditional normalizing flow framework. This allows us to directly supervise the posterior distribution through a novel determinant magnitude regularization, which is key to varied 3D hand pose samples that project well into the input image. We also demonstrate that metrics commonly used to assess reconstruction quality are insufficient to evaluate pose predictions under such severe ambiguity. To address this, we release the first dataset with multiple plausible annotations per image called MultiHands. The additional annotations enable us to evaluate the estimated distribution using the maximum mean discrepancy metric. Through this, we demonstrate the quality of our probabilistic reconstruction and show that explicit ambiguity modeling is better-suited for this challenging problem.Item HDHumans: A Hybrid Approach for High-fidelity Digital Humans(ACM Association for Computing Machinery, 2023) Habermann, Marc; Liu, Lingjie; Xu, Weipeng; Pons-Moll, Gerard; Zollhoefer, Michael; Theobalt, Christian; Wang, Huamin; Ye, Yuting; Victor ZordanPhoto-real digital human avatars are of enormous importance in graphics, as they enable immersive communication over the globe, improve gaming and entertainment experiences, and can be particularly beneficial for AR and VR settings. However, current avatar generation approaches either fall short in high-fidelity novel view synthesis, generalization to novel motions, reproduction of loose clothing, or they cannot render characters at the high resolution offered by modern displays. To this end, we propose HDHumans, which is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface and highly photo-realistic images of arbitrary novel views and of motions not seen at training time. At the technical core, our method tightly integrates a classical deforming character template with neural radiance fields (NeRF). Our method is carefully designed to achieve a synergy between classical surface deformation and a NeRF. First, the template guides the NeRF, which allows synthesizing novel views of a highly dynamic and articulated character and even enables the synthesis of novel motions. Second, we also leverage the dense pointclouds resulting from the NeRF to further improve the deforming surface via 3D-to-3D supervision. We outperform the state of the art quantitatively and qualitatively in terms of synthesis quality and resolution, as well as the quality of 3D surface reconstruction.Item IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object Interactions(The Eurographics Association and John Wiley & Sons Ltd., 2023) Ghosh, Anindita; Dabral, Rishabh; Golyanik, Vladislav; Theobalt, Christian; Slusallek, Philipp; Myszkowski, Karol; Niessner, MatthiasCan we make virtual characters in a scene interact with their surrounding objects through simple instructions? Is it possible to synthesize such motion plausibly with a diverse set of objects and instructions? Inspired by these questions, we present the first framework to synthesize the full-body motion of virtual human characters performing specified actions with 3D objects placed within their reach. Our system takes textual instructions specifying the objects and the associated 'intentions' of the virtual characters as input and outputs diverse sequences of full-body motions. This contrasts existing works, where full-body action synthesis methods generally do not consider object interactions, and human-object interaction methods focus mainly on synthesizing hand or finger movements for grasping objects. We accomplish our objective by designing an intent-driven fullbody motion generator, which uses a pair of decoupled conditional variational auto-regressors to learn the motion of the body parts in an autoregressive manner. We also optimize the 6-DoF pose of the objects such that they plausibly fit within the hands of the synthesized characters. We compare our proposed method with the existing methods of motion synthesis and establish a new and stronger state-of-the-art for the task of intent-driven motion synthesis.Item Pacific Conference on Computer Graphics and Applications - Short Papers 2019: Frontmatter(Eurographics Association, 2019) Lee, Jehee; Theobalt, Christian; Wetzstein, Gordon; Lee, Jehee and Theobalt, Christian and Wetzstein, GordonItem Pacific Conference on Computer Graphics and Applications 2019 - CGF38-7: Frontmatter(The Eurographics Association and John Wiley & Sons Ltd., 2019) Lee, Jehee; Theobalt, Christian; Wetzstein, Gordon; Lee, Jehee and Theobalt, Christian and Wetzstein, GordonItem Scene-Aware 3D Multi-Human Motion Capture from a Single Camera(The Eurographics Association and John Wiley & Sons Ltd., 2023) Luvizon, Diogo C.; Habermann, Marc; Golyanik, Vladislav; Kortylewski, Adam; Theobalt, Christian; Myszkowski, Karol; Niessner, MatthiasIn this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-thewild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/ scene-aware-3d-multi-humanItem State of the Art in Dense Monocular Non-Rigid 3D Reconstruction(The Eurographics Association and John Wiley & Sons Ltd., 2023) Tretschk, Edith; Kairanda, Navami; B R, Mallikarjun; Dabral, Rishabh; Kortylewski, Adam; Egger, Bernhard; Habermann, Marc; Fua, Pascal; Theobalt, Christian; Golyanik, Vladislav; Bousseau, Adrien; Theobalt, Christian3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since-without additional prior assumptions-it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods-that handle arbitrary scenes and make only a few prior assumptions-and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.Item State of the Art on Neural Rendering(The Eurographics Association and John Wiley & Sons Ltd., 2020) Tewari, Ayush; Fried, Ohad; Thies, Justus; Sitzmann, Vincent; Lombardi, Stephen; Sunkavalli, Kalyan; Martin-Brualla, Ricardo; Simon, Tomas; Saragih, Jason; Nießner, Matthias; Pandey, Rohit; Fanello, Sean; Wetzstein, Gordon; Zhu, Jun-Yan; Theobalt, Christian; Agrawala, Maneesh; Shechtman, Eli; Goldman, Dan B.; Zollhöfer, Michael; Mantiuk, Rafal and Sundstedt, VeronicaEfficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photorealistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. Specifically, our emphasis is on the type of control, i.e., how the control is provided, which parts of the pipeline are learned, explicit vs. implicit control, generalization, and stochastic vs. deterministic synthesis. The second half of this state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems.