DSpace Repository :: Browsing by Author "Golyanik, Vladislav"

Browsing by Author "Golyanik, Vladislav"

Now showing 1 - 7 of 7

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video
(The Eurographics Association and John Wiley & Sons Ltd., 2025) Kappel, Moritz; Hahlbohm, Florian; Scholz, Timon; Castillo, Susana; Theobalt, Christian; Eisemann, Martin; Golyanik, Vladislav; Magnor, Marcus; Bousseau, Adrien; Day, Angela
Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a dynamic neural point cloud, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our code and data are available online https://moritzkappel.github.io/projects/dnpc/.
IMoS: Intent-Driven Full-Body Motion Synthesis for Human-Object Interactions
(The Eurographics Association and John Wiley & Sons Ltd., 2023) Ghosh, Anindita; Dabral, Rishabh; Golyanik, Vladislav; Theobalt, Christian; Slusallek, Philipp; Myszkowski, Karol; Niessner, Matthias
Can we make virtual characters in a scene interact with their surrounding objects through simple instructions? Is it possible to synthesize such motion plausibly with a diverse set of objects and instructions? Inspired by these questions, we present the first framework to synthesize the full-body motion of virtual human characters performing specified actions with 3D objects placed within their reach. Our system takes textual instructions specifying the objects and the associated 'intentions' of the virtual characters as input and outputs diverse sequences of full-body motions. This contrasts existing works, where full-body action synthesis methods generally do not consider object interactions, and human-object interaction methods focus mainly on synthesizing hand or finger movements for grasping objects. We accomplish our objective by designing an intent-driven fullbody motion generator, which uses a pair of decoupled conditional variational auto-regressors to learn the motion of the body parts in an autoregressive manner. We also optimize the 6-DoF pose of the objects such that they plausibly fit within the hands of the synthesized characters. We compare our proposed method with the existing methods of motion synthesis and establish a new and stronger state-of-the-art for the task of intent-driven motion synthesis.
Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
(The Eurographics Association and John Wiley & Sons Ltd., 2024) Yunus, Raza; Lenssen, Jan Eric; Niemeyer, Michael; Liao, Yiyi; Rupprecht, Christian; Theobalt, Christian; Pons-Moll, Gerard; Huang, Jia-Bin; Golyanik, Vladislav; Ilg, Eddy; Aristidou, Andreas; Macdonnell, Rachel
Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.
Scene-Aware 3D Multi-Human Motion Capture from a Single Camera
(The Eurographics Association and John Wiley & Sons Ltd., 2023) Luvizon, Diogo C.; Habermann, Marc; Golyanik, Vladislav; Kortylewski, Adam; Theobalt, Christian; Myszkowski, Karol; Niessner, Matthias
In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require expert knowledge. To deal with this challenging setting, we leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. Thus, we introduce the first non-linear optimization-based approach that jointly solves for the 3D position of each human, their articulated pose, their individual shapes as well as the scale of the scene. In particular, we estimate the scene depth and person scale from normalized disparity predictions using the 2D body joints and joint angles. Given the per-frame scene depth, we reconstruct a point-cloud of the static scene in 3D space. Finally, given the per-frame 3D estimates of the humans and scene point-cloud, we perform a space-time coherent optimization over the video to ensure temporal, spatial and physical plausibility. We evaluate our method on established multi-person 3D human pose benchmarks where we consistently outperform previous methods and we qualitatively demonstrate that our method is robust to in-thewild conditions including challenging scenes with people of different sizes. Code: https://github.com/dluvizon/ scene-aware-3d-multi-human
State of the Art in Dense Monocular Non-Rigid 3D Reconstruction
(The Eurographics Association and John Wiley & Sons Ltd., 2023) Tretschk, Edith; Kairanda, Navami; B R, Mallikarjun; Dabral, Rishabh; Kortylewski, Adam; Egger, Bernhard; Habermann, Marc; Fua, Pascal; Theobalt, Christian; Golyanik, Vladislav; Bousseau, Adrien; Theobalt, Christian
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since-without additional prior assumptions-it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods-that handle arbitrary scenes and make only a few prior assumptions-and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.
State of the Art on Diffusion Models for Visual Computing
(The Eurographics Association and John Wiley & Sons Ltd., 2024) Po, Ryan; Yifan, Wang; Golyanik, Vladislav; Aberman, Kfir; Barron, Jon T.; Bermano, Amit; Chan, Eric; Dekel, Tali; Holynski, Aleksander; Kanazawa, Angjoo; Liu, C. Karen; Liu, Lingjie; Mildenhall, Ben; Nießner, Matthias; Ommer, Björn; Theobalt, Christian; Wonka, Peter; Wetzstein, Gordon; Aristidou, Andreas; Macdonnell, Rachel
The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.
Virtual Humans meet Event-based and Quantum-enhanced Vision
(The Eurographics Association, 2025) Habermann, Marc; Golyanik, Vladislav; Mantiuk, Rafal; Hildebrandt, Klaus
The tutorial is split in two parts, i.e. two 90 minute talks. In the first half, Marc Habermann will provide an introduction to creating a digital twin of a real human. Second, Vladislav Golyanik will introduce new ways of sensing the real world using event-based vision and how quantum computers can enhance fundamental problems in graphics and vision.

Browsing by Author "Golyanik, Vladislav"

Results Per Page

Sort Options