2025

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3607122

Advancing Machine Learning Algorithms for Object Localization in Data-Limited Scenarios : Techniques for 6DoF Pose Estimation and 2D Localization with limited Data

[meta data] [files:

Dissertation.pdf

]

Pöllabauer, Thomas Jürgen

Computational Design of Deployable Gridshells with Curved Elastic Beams

[meta data] [files:

QuentinBecker_EPFL_TH10342.pdf

]

Becker, Quentin

Efficient Computational Models for Forward and Inverse Elasticity Problems

[meta data] [files:

DoctoralThesisYueLiETH.pdf

]

Li, Yue

Deep High Dynamic Range Imaging: Reconstruction, Generation and Display

[meta data] [files:

Thesis_Chao_Wang.pdf

]

Chao Wang

Toward General-Purpose Monte Carlo PDE Solvers for Graphics Applications

[meta data] [files:

Ryusuke_Sugimoto_PhD_Thesis.pdf

]

Sugimoto, Ryusuke

Lifelike Motions for Robotic Characters

[meta data] [files:

Dissertation_AgonSerifi.pdf

]

Serifi, Agon

Deployable, Modular, and Reconfigurable: Computational Design of Umbrella Meshes

[meta data] [files:

EPFL_TH10895.pdf

]

Kusupati, Uday

Efficient and Accurate Optimization in Inverse Rendering and Computer Graphics

[meta data] [files:

MichaelFischer_PhDThesis.pdf

]

Fischer, Michael

Proximity-Based Point Cloud Reconstruction

[meta data] [files:

Marin Diana - 2025 - Proximity-Based Point Cloud Reconstruction.pdf

]

Marin, Diana

Symplectic-prequantum structures and dynamics on the codimension-2 shape space

[meta data] [files:

Thesis_Sadashige_Ishida_PDFA.pdf

]

Sadashige Ishida

How to Train Your Renderer: Optimized Methods for Learning Path Distributions in Monte Carlo Light Transport

[meta data] [files:

Printed.pdf

]

Rath, Alexander

Image-based 3D Reconstructions via Differentiable Rendering of Neural Implicit Representations

[meta data] [files:

PhD_Thesis_corrected.pdf

]

Tianhao Wu

Neural Point-based Rendering for Immersive Novel View Synthesis

[meta data] [files:

LinusFranke_Dissertation.pdf

]

Franke, Linus

Toward Democratizing Human Motion Generation

[meta data] [files:

Guy_Tevet_phd_thesis.pdf

]

Guy Tevet

Browse

Now showing 1 - 14 of 14

Advancing Machine Learning Algorithms for Object Localization in Data-Limited Scenarios : Techniques for 6DoF Pose Estimation and 2D Localization with limited Data
(2025-01-20) Pöllabauer, Thomas Jürgen
Recent successes of Machine Learning (ML) algorithms have profoundly influenced many fields, particularly Computer Vision (CV). One longstanding problem in CV is the task of determining the position and orientation of an object as depicted in an image in 3D space, relative to the recording camera sensor. Accurate pose estimation is essential for domains, such as robotics, augmented reality, autonomous driving, quality inspection in manufacturing, and many more. Current state-of-the-art pose estimation algorithms are dominated by Deep Learning-based approaches. However, adoption of these best in class algorithms to real-world tasks is often constrained by data limitations, such as not enough training data being available, existing data being of insufficient quality, data missing annotations, data having noisy annotations, or no directly suitable training data being available at all. This thesis presents contributions on both 6D object pose estimation, as well as on alleviating the restrictions of data limitations, for pose estimation, and for related CV problems such as classification, segmentation, and 2D object detection. It offers a range of solutions to enhance quality and efficiency of these tasks under different kinds of data limitations. The first contribution enhances a state-of-the-art pose estimation algorithm to predict a probability distribution of poses, instead of a single pose estimate. This approach allows to sample multiple, plausible poses for further refinement and outperforms the baseline algorithm even when sampling only the most likely pose. In our second contribution, we drastically improve runtime and reduce resource requirements to bring state-of-the-art pose estimation to low power edge devices, such as modern augmented and extended reality devices. Finally, we extend a pose estimator based on dense-feature prediction to incorporate additional views and illustrate its performance benefits in the stereo use case. The second set of two contributions focuses on data generation for ML-based CV tasks. High quality training data is a crucial component for best performance. We introduce a novel yet simple setup to record physical objects and generate all necessary annotations in a fully automated way. Evaluated on the 2D object detection use case, training on our data performs favourably with more complex data generation processes, such as real-world recordings and physically-based rendering. In a follow-up paper, we further improve upon the results by introducing a novel postprocessing step based on denoising diffusion probabilistic models (DDPM). At the intersection of 6D pose estimation and data generation methods, a final group of three contributions focuses on solving or circumventing the data problem with a range of different approaches. First, we demonstrate the use of physically-based, photorealistic, and non-photorealistic rendering to localize objects on Microsoft HoloLens 2, without needing any real-world images for training. Second, we extend a zero-shot pose estimation method by predicting geometric features, thereby improving estimation quality with almost no additional runtime. Third, we demonstrate pose estimation of objects with unseen appearances based on a 3D scene representation, allowing robust mesh-free pose estimation. In summary, this thesis advances the fields of 6D object pose estimation and alleviates some common data limitations for pose estimation and similar Machine Learning algorithms in Computer Vision problems, such as 2D detection and segmentation. The solutions proposed include several extensions to state-of-the-art 6D pose estimators and address the challenges of limited or poor quality training data, paving the way for more accurate, efficient, and accessible pose estimation technologies across various industries and fields.
Computational Design of Deployable Gridshells with Curved Elastic Beams
(EPFL, 2025) Becker, Quentin
Deployable gridshells are lightweight structures made of interconnected elastic beams. They can be actuated from a compact state to a freeform and volume-enclosing deployed shape. This thesis introduces C-shells, a novel class of deployable gridshells, which employs curved elastic rods connected at single-axis rotational joints. As opposed to their straight counterparts, C-shells are guaranteed to be assembled in a planar and stress-free configuration while showing a wide diversity in their deployed shapes. They may serve as temporary shelters, pavilions, or on a smaller scale, as deployable furniture or decorative elements. This thesis presents a comprehensive framework for the forward exploration of C-shell designs, enabling designers to interactively search the shape space and generate deployable structures with diverse appearances and topologies. The framework combines human-interpretable manipulations of a reference linkage with an efÏcient physics-based simulation to predict the deployed shape and mechanical behavior of the structure. Preservation of the linkage deployability and smoothness of the edits are ensured through the use of conformal maps as design handles. The framework is implemented as a Rhino-Grasshopper plugin, providing visual and quantitative realtime feedback on the deployed state. The inverse design of C-shells is also addressed, where the deployed shape is given, and the flat state of the structure is computed. This thesis introduces a two-step pipeline composed of a flattening method and a design optimization algorithm. The flattening algorithm is based on kinetic considerations underlying the deployment of C-shells. The method harmonizes a flat and a hypothetical deployed state constrained on a user-prescribed target surface. The flat beam layout is further adjusted to minimize the deviation of the deployed shape to the target surface while ensuring a low elastic energy deployed state, under some beam smoothness regularization. The proposed method is validated through scanned small-scale prototypes. C-shells are made of curved rods, which entails additional material waste compared to straight beams. To address this issue, this thesis presents a rationalization method that splits the curved beams into smaller straight elements which can be grouped into a sparse kit of parts, while preserving user-provided designs. The original combinatorial problem of jointly assigning parts to elements and adapting the parts’ geometry is relaxed into a two-step optimization process incorporating our physics-based simulation, making it tractable using continuous optimization techniques. The proposed method applies more generally to bending-active structures and is further demonstrated on orthogonal gridshells and umbrella meshes. Part reuse is assessed in a study of the trade-off between the number of parts and fidelity to the input designs.
Efficient Computational Models for Forward and Inverse Elasticity Problems
(ETH Zurich, 2025-05-28) Li, Yue
Elasticity is at the core of many scientific and engineering applications, including the design of resilient structures and advanced materials, and the modeling of biological tissues. Simulating elastic systems poses significant computational challenges due to the inherent nonlinearity of the governing equations, which calls for efficient optimization methods to determine equilibrium states. Second-order methods are particularly attractive because of their superior convergence properties relative to first-order techniques. However, the effective use of second-order solvers requires that the underlying functions and their derivatives are sufficiently smooth and available in closed form. This smoothness can easily degrade when generalizing standard computational models to a broader set of design tasks. This thesis proposes efficient computational models that enable robust and effective simulations for physics-based modeling and the design of complex elastic systems. In chapter~\ref{chapter:PDW}, we propose a novel fabric-like metamaterial that features persisting contacts between 3D-printed yarns. To avoid the complexities of explicit contact modeling, we adopt an Eulerian-on-Lagrangian simulation paradigm; however, current methods remain limited to straight rods. We leverage a $C^2$-continuous representation to allow for Newton-type minimization on naturally curved rods. Chapter~\ref{chapter:DiffGD} presents a computational paradigm for intrinsic minimization of distance-based objectives defined on triangle meshes. Although Euclidean distances meet the $C^2$-continuity requirement, geodesic distances on triangle meshes do not. To permit efficient second-order optimization of embedded elasticity problems, we provide analytical derivatives as well as suitable mollifiers to recover $C^2$-continuity. Finally, in chapter~\ref{chapter:NMN}, we address non-smoothness issues that arise in nonlinear material design, where changes in geometry parameters can lead to discontinuous changes in simulation meshes. We employ neural networks with tailored nonlinearities as $C^\infty$-continuous and differentiable representations to characterize the elastic properties of families of mechanical metamaterials. The resulting smooth representation enables gradient-based inverse design for various high-level design goals.
Deep High Dynamic Range Imaging: Reconstruction, Generation and Display
(2025-07-04) Chao Wang
High Dynamic Range (HDR) images offer significant advantages over Low Dynamic Range (LDR) images, including greater bit depth, a wider color gamut, and a higher dynamic range. These features not only provide users with an enhanced visual experience but also facilitate post-production processes in photography and filmmaking. Despite the considerable advancements in HDR technology over the years, significant challenges persist in the acquisition and display of HDR content. This thesis systematically explores the potential of leveraging deep learning techniques combined with physical prior knowledge to address these challenges. First, it investigates how implicit neural representations can be utilized to reconstruct all-in-focus HDR images from sparse, defocused LDR inputs, enabling flexible refocusing and re-exposure. Additionally, it extends the scope to the 3D domain by employing 3D Gaussian Splatting to reconstruct HDR all-in-focus fields from multi-view LDR defocused images, supporting novel view synthesis with refocusing and re-exposure capabilities. Expanding further, the thesis investigates strategies for generating HDR content from the in-the-wild LDR data or limited HDR datasets, and subsequently utilizes the resulting HDR generative models as priors to enable the transformation of LDR images into HDR. Finally, it proposes a feature contrast masking loss inspired by visual masking theory, enabling a self-supervised learning tone mapper to display the HDR content on LDR devices.
Toward General-Purpose Monte Carlo PDE Solvers for Graphics Applications
(University of Waterloo, 2025-09-22) Sugimoto, Ryusuke
This thesis develops novel Monte Carlo methods for solving a wide range of partial differential equations (PDEs) relevant to computer graphics. While traditional discretization-based approaches efficiently compute global solutions, they often require expensive global solves even when only local evaluations are needed, and can struggle with complex or fine-scale geometries. Monte Carlo methods based on the classical Walk on Spheres (WoS) approach [Muller 1956] offer pointwise evaluation with strong geometric robustness, but in practice, their application has been largely limited to interior Dirichlet problems in volumetric domains. We significantly broaden this scope by designing versatile Monte Carlo solvers that handle a diverse set of PDEs and boundary conditions, validated through comprehensive experimental results. First, we introduce the Walk on Boundary (WoB) method [Sabelfeld 1982, 1991] to graphics. While retaining WoS’s advantages, WoB applies to a broader range of second-order linear elliptic and parabolic PDE problems: various boundary conditions (Dirichlet, Neumann, Robin, and mixed) in both interior and exterior domains. Because WoB is based on boundary integral formulations, its structure more closely parallels Monte Carlo rendering than WoS, enabling the application of advanced variance reduction techniques. We present WoB formulations for elliptic Laplace and Poisson equations, time-dependent diffusion problems, and develop a WoB solver for vector-valued Stokes equations. Throughout, we discuss how sampling and variance reduction methods from rendering can be adapted to WoB. Next, we address the nonlinear Navier-Stokes equations for fluid simulation, whose complexity challenges Monte Carlo techniques. Employing operator splitting, we separate nonlinear terms and solve the remaining linear terms with pointwise Monte Carlo solvers. Recursively applying these solvers with timestepping yields a spatial-discretization-free method. To deal with the resulting exponential computational cost, we also propose cache-based alternatives. Both vorticity- and velocity-based formulations are explored, retaining the advantages of Monte Carlo methods, including geometric robustness and variance reduction, while integrating traditional fluid simulation techniques. We then propose Projected Walk on Spheres (PWoS), a novel solver for surface PDEs, inspired by the Closest Point Method. PWoS modifies WoS by projecting random walks onto the surface manifold at each step, preserving geometric flexibility and discretization-free, pointwise evaluation. We also adapt a noise filtering technique for WoS to improve PWoS. Finally, we outline promising future research directions for Monte Carlo PDE solvers in graphics, including concrete proposals to enhance WoB.
Lifelike Motions for Robotic Characters
(ETH Zurich, 2025-09) Serifi, Agon
Humanoids have made significant advances in recent years. Nonetheless, the motions they perform often remain rigid, mechanical, and lack the diversity and expressiveness of human motion. This stands in stark contrast to physics-based simulated characters, which are capable of performing agile and lifelike motions in fully simulated environments. Such characters typically leverage reinforcement learning in combination with motion capture data to learn how to move like humans. However, their success is closely tied to unrealistic modeling assumptions such as simplified dynamics, overpowered actuators, or noise-free sensing. While these assumptions enable efficient and stable training, they hinder the transfer to the real world. In the real world, there are no shortcuts. To achieve more dynamic motions for humanoids, physically accurate simulation and robust learning methods are essential. This requires rethinking many components along the pipeline, starting from the simulators and how to account for sim-to-real gaps, up to questions about how to represent, track, and generate motions for humanoids. In this dissertation, we present several contributions in this direction and bring more lifelike motions to robotic characters. First, we present a learning-based modular simulation augmentation to reduce the sim-to-real gap. Our method can generalize across robot configurations and helps to better estimate the state of the robot. In a second contribution, we propose a novel architecture for encoding motions as a trajectory in latent space. The architecture overcomes the need for absolute positional encoding, leading to better reconstruction quality of various sequential data types. In a third contribution, we show how a pretrained latent space can be leveraged to train more accurate and robust control policies using reinforcement learning. Our two-stage method transfers to the real world and brings dynamic dancing motions to a humanoid robot. Our last contribution physically aligns kinematic motion generators with the capabilities of the character and its control policy. This allows for a more successful transfer of generated motions to the real world. The methods and concepts introduced in this dissertation make robots move more lifelike and reduce the gap to simulated characters. We hope they will inspire future research and bring more believable robots into our world.
Deployable, Modular, and Reconfigurable: Computational Design of Umbrella Meshes
(EPFL, 2025-07-24) Kusupati, Uday
Deployable structures that transform from a planar assembly-friendly compact state to an expansive freeform surface state have diverse applications in robotics, medical devices, temporary installations, and architecture. Umbrella Meshes are a new class of volumetric deployable structures with extensive shape expression capabilities compared to existing plane-to-surface deployables. They are modular, made of Umbrella cells consisting of identical rigid plates and rotational joints connected by elastic beams of varying heights. Deployment is actuated by pushing the cells orthogonal to the plane, rotating the elastic beams from vertical to horizontal configurations, thus redistributing material from out of the plane into it. In contrast to rigid scissor mechanisms, the beams deform elastically, making the deployed equilibrium bending-active. Assembled in a stress-free planar configuration, an Umbrella Mesh can be programmed to deploy to a desired target shape by virtue of the optimized heights of the constituent cells. The rich design space facilitates programming a large range of target shapes, controlling the structural stiffness, and encoding extrinsic curvature. This thesis contributes a comprehensive computational framework for the design and optimization of Umbrella Meshes. To facilitate design exploration of the deployed structure, we develop a physics-based simulation modeling the deployment process under actuation forces. We abstract the deployment transformation of an umbrella mesh using conformal geometry, providing intuitive design initializations for a specific target surface. Our inverse design algorithm leverages the simulation pipeline and numerical optimization to iteratively refine a design to approximate a target surface while minimizing the elastic energy and actuation forces involved. We build optimized physical prototypes through digital fabrication and validate our computational pipeline. The inverse design framework exemplifies a design-driven approach to fabricating optimized physical structures. The latter half of this thesis focuses on fabrication-driven design. We develop a computational framework to rationalize bending-active structures into a sparse kit of parts, allowing cost-effective fabrication. Our method can either find an optimal kit of parts for multiple input designs or rationalize existing designs to use a pre-fabricated kit of parts. To tackle the non-trivial coupling of components in bending-active systems, we propose a relaxed continuous formulation of the combinatorial problem of grouping components to a sparse part set, allowing us to incorporate physics-based simulation that tracks multiple bending-active equilibria. We demonstrate our approach on Umbrella Meshes, C-shells, and orthogonal gridshells. The thesis culminates with Reconfigurable Umbrella Meshes (RUMs) consisting of identical reconfigurable cells. Each reconfigurable cell can assume the form of a continuous range of parts, thus combining the benefits of pre-fabrication and precisely inverse-designed heights. Assembled from these identical mass-producible cells, the same RUM can deploy into several shapes over multiple deployment cycles. Our inverse design enables precise reconfiguration of the compact state and opens up multiple research avenues for high-fidelity shape morphing control with applications in soft robotics and sustainable architecture.
Efficient and Accurate Optimization in Inverse Rendering and Computer Graphics
(2025-05-09) Fischer, Michael
Efficient and accurate representation of graphic assets, a long-standing task in the graphics community, has achieved new heights with the advent of learning-based methods by representing visual appearance as neural networks. Surprisingly, such visual appearance networks are often trained from scratch – an expensive operation that ignores potentially helpful information from previous training runs. This thesis therefore introduces Metappearance, an algorithm which optimizes over optimization itself and enables orders of magnitude faster training times at indistinguishable visual quality while retaining the network’s adaptability to new, unseen data. Moreover, even a fully converged network, albeit a smooth function, does not guarantee optimization success when employed in an inverse rendering scenario. In fact, it is common for inverse rendering to exhibit plateaus – regions of zero gradient – in the cost function, which hinder gradient-based optimization from converging. Chapter 4 therefore introduces an algorithm that smooths out such plateaus by convolving the rendering equation with a Gaussian blur kernel and thus successfully optimizes scenarios where other, rigid methods fail to converge. Finally, while recent research has shown that specialized treatment of the renderer’s internals can yield correct, usable gradients, there is no unified, systematic way of differentiating through arbitrary, black-box graphics pipelines. We therefore introduce the concept of neural surrogates, which allow differentiating through arbitrary forward models without requiring access to, or making any assumptions on, the rendering pipeline’s internals. We show that our neural surrogate losses can successfully optimize various graphics tasks and scale well to high dimensions, a domain where traditional derivative-free optimizers often do not converge.
Proximity-Based Point Cloud Reconstruction
(2025-02-13) Marin, Diana
Extrapolating information from incomplete data is a key human skill, enabling us to infer patterns and make predictions from limited observations. A prime example is our ability to perceive coherent shapes from seemingly random point sets, a key aspect of cognition. However, data reconstruction becomes challenging when no predefined rules exist, as it is unclear how to connect the data or infer patterns. In computer graphics, a major goal is to replicate this human ability by developing algorithms that can accurately reconstruct original structures or extract meaningful information from raw, disconnected data. The contributions of this thesis deal with point cloud reconstruction, leveraging proximity-based methods, with a particular focus on a specific proximity-encoding data structure - the spheres-of-influence graph (SIG). We discuss curve reconstruction, where we automate the game of connecting the dots to create contours, providing theoretical guarantees for our method. We obtain the best results compared to similar methods for manifold curves. We extend our curve reconstruction to manifolds, overcoming the challenges of moving to different domains, and extending our theoretical guarantees. We are able to reconstruct curves from sparser inputs compared to the state-of-the-art, and we explore various settings in which these curves can live. We investigate the properties of the SIG as a parameter-free proximity encoding structure of three-dimensional point clouds. We introduce new spatial bounds for the SIG neighbors as a theoretical contribution. We analyze how close the encoding is to the ground truth surface compared to the commonly used kNN graphs, and we evaluate our performance in the context of normal estimation as an application. Lastly, we introduce SING – a stability-incorporated neighborhood graph, a useful tool with various applications, such as clustering, and with a strong theoretical background in topological data analysis.
Symplectic-prequantum structures and dynamics on the codimension-2 shape space
(2025-10-31) Sadashige Ishida
The space of codimension-2 shapes, such as curves in 3D and surfaces in 4D, is an infinite-dimensional manifold. This thesis explores geometric structures and dynamics on this space, with emphasis on their implications for physics, particularly hydrodynamics. Our investigation ranges from theoretical studies of infinite-dimensional symplectic and prequantum geometry to numerical computation of the time evolution of shapes.
How to Train Your Renderer: Optimized Methods for Learning Path Distributions in Monte Carlo Light Transport
(2025-05-06) Rath, Alexander
Light transport simulation allows us to preview architectural marvels before they break ground, practice complex surgeries without a living subject, and explore alien worlds from the comfort of our homes. Fueled by the steady advancements in computer hardware, rendering virtual scenes is more accessible than ever, and is met by an unprecedented demand for such content. Light interacts with our world in various intricate ways, hence the challenge in realistic rendering lies in tracing all the possible paths that light could take within a given virtual scene. Contemporary approaches predominantly rely on Monte Carlo integration, for which countless sampling procedures have been proposed to handle certain families of effects robustly. Handling all effects holistically through specialized sampling routines, however, remains an unsolved problem. A promising alternative is to use learning techniques that automatically adapt to the effects present in the scene. However, such approaches require many complex design choices to be made, which existing works commonly resort to heuristics for. In this work, we investigate what constitutes effective learning algorithms for rendering – from data representation and the quantities to be learned, to the fitting process itself. By strategically optimizing these components for desirable goals, such as overall render efficiency, we demonstrate significant improvements over existing approaches.
Image-based 3D Reconstructions via Differentiable Rendering of Neural Implicit Representations
(2025-02-14) Tianhao Wu
Modeling objects in 3D is critical for various graphics and metaverse applications and is a fundamental step towards 3D machine reasoning, and the ability to reconstruct objects from RGB images only significantly enhances its applications. Representing objects in 3D involves learning two distinct aspects of the objects: geometry, which represents where the mass is located; and appearance, which affects the exact pixel colors to be rendered on the screen. While learning approximated appearance with known geometry is straightforward, obtaining correct geometry or recovering both simultaneously from RGB images alone has been a challenging task for a long period. The recent advancements in Differentiable Rendering and Neural Implicit Representations have significantly pushed the limits of geometry and appearance reconstruction from RGB images. Utilizing their continuous, differentiable, and less restrictive representations, it is possible to optimize geometry and appearance simultaneously from the ground truth images, leading to much better reconstruction accuracy and re-rendering quality. As one of the major neural implicit representations that have received great attention, Neural Radiance Field (NeRF) achieves clean and straightforward reconstruction of volumetric geometry and non-Lambertian appearance together from a dense set of RGB images. Various other forms of representations or modifications have also been proposed to handle specific tasks such as smooth surface modeling, sparse view reconstruction, or dynamic scene reconstruction. However, existing methods still make strict assumptions about the scenes captured and reconstructed, significantly constraining their application scenarios. For instance, current reconstructions typically assume the scene to be perfectly opaque with no semi-transparent effects, or assume no dynamic noise or occluders are included in the capture, or do not optimize rendering efficiency for high-frequency appearance in the scene. In this dissertation, we present three advancements to push the quality of image-based 3D reconstruction towards robust, reliable, and user-friendly real-world solutions. Our improvements cover all of the representation, architecture, and optimization of image-based 3D reconstruction approaches. First, we introduce AlphaSurf, a novel implicit representation with decoupled geometry and surface opacity and a grid-based architecture to enable accurate surface reconstruction of intricate or semi-transparent objects. Compared to a traditional image-based 3D reconstruction pipeline that considers only geometry and appearance, it distinguishes the calculation of the ray-surface intersection and intersection opacity differently while maintaining both to be naturally differentiable, supporting decoupled optimization from photometric loss. Specifically, intersections on AlphaSurf are found in closed-form via analytical solutions of cubic polynomials, avoiding Monte-Carlo sampling, and are therefore fully differentiable by construction, whereas additional grid-based opacity and radiance field are incorporated to allow reconstruction from RGB images only. We then consider the dynamic noise and occluders accidentally included in capture for static 3D reconstruction, as this is a common challenge encountered in the real world. This issue is particularly problematic for street scans or scenes with potential dynamic noises, such as cars, humans, or plants. We propose D^2NeRF, a method that reconstructs 3D scenes from casual mobile phone videos with all dynamic occluders decoupled from the static scene. This approach incorporates modeling of both 3D and 4D objects from RGB images and utilizes freedom constraints to achieve dynamic decoupling without semantic-based guidance. Hence, it can work on uncommon dynamic noises such as pouring liquid and moving shadows. Finally, we look into the efficiency constraint of 3D reconstruction and rendering, and specifically propose a solution for light-weight representation of scene components with simple geometry but high-frequency textures. We utilize a sparse set of anchors with correspondences from 3D to 2D texture space, enabling the high-frequency clothes on a forward-facing neural avatar to be modeled using 2D texture with neural deformation as a simplified and constrained representation. This dissertation provides a comprehensive overview of neural implicit representations and various applications in 3D reconstruction from RGB images, along with several advancements for achieving more robust and efficient reconstruction in various challenging real-world scenarios. We demonstrate that the representation, architecture, and optimization need to be specifically designed to deal with challenging obstacles in the image-based reconstruction task due to the severely ill-posed nature of the problem. With the correct design of the method, we can reconstruct translucent surfaces, remove dynamic occluders in the capture, and efficiently model high-frequency appearance from only posed multiview images or monocular video.
Neural Point-based Rendering for Immersive Novel View Synthesis
(Open FAU, 2025-05-26) Franke, Linus
Recent advances in neural rendering have greatly improved the realism and efficiency of digitizing real-world environments, enabling new possibilities for virtual experiences. However, achieving high-quality digital replicas of physical spaces is challenging due to the need for advanced 3D reconstruction and real-time rendering techniques, with visual outputs often deteriorating in challenging capturing conditions. Thus, this thesis explores point-based neural rendering approaches to address key challenges such as geometric inconsistencies, scalability, and perceptual fidelity, ultimately enabling realistic and interactive virtual scene exploration. The vision here is to enable immersive virtual reality (VR) scene exploration and virtual teleportation with the best perceptual quality for the user. This work introduces techniques to improve point-based Novel View Synthesis (NVS) by refining geometric accuracy and reducing visual artifacts. By detecting and correcting errors in point-cloud-based reconstructions, this approach improves rendering stability and accuracy. Additionally, an efficient rendering pipeline is proposed that combines rasterization with neural refinement to achieve high-quality results at real-time frame rates, ensuring smooth and consistent visual output across diverse scenes. To extend the scalability of neural point representations, a hierarchical structure is presented that efficiently organizes and renders massive point clouds, enabling real-time NVS of city-scale environments. Furthermore, a perceptually optimized foveated rendering technique is developed for VR applications, leveraging the characteristics of the human visual system to balance performance and perceptual quality. Lastly, a real-time neural reconstruction technique is proposed that eliminates preprocessing requirements, allowing for immediate virtual teleportation and interactive scene exploration. Through these advances, this thesis pushes the boundaries of neural point-based rendering, offering solutions that balance quality, efficiency, and scalability. The findings pave the way for more interactive and immersive virtual experiences, with applications spanning VR, augmented reality (AR), and digital content exploration.
Toward Democratizing Human Motion Generation
(Tel Aviv University, 2025-04-24) Guy Tevet
Human motion generation is a challenging task due to the intricate complexity of human movement. Capturing the subtle dynamics of coordination, balance, and expression requires models capable of synthesizing both the physical plausibility and the nuanced variability inherent in human motion. Furthermore, the interdependence of spatial and temporal factors makes designing effective algorithms an intricate problem. As a result, motion generation remains accessible primarily to professional users, and even for them, it is a labor-intensive process requiring significant expertise and resources. The overarching goal of this work is to develop generative tools and intuitive controls that empower content creators, democratizing human motion synthesis and addressing these challenges. By leveraging advances in machine learning and generative Artificial intelligence (AI), this research seeks to enable users, regardless of expertise, to produce realistic, diverse, and context-aware animations with minimal effort. Such tools are not only intended to ease the technical and creative burden for professionals but also to open up animation and motion creation to a broader audience, making the process approachable, efficient, and cost-effective. The journey begins with MotionCLIP, which bridges the human motion domain with the semantic richness of CLIP. By aligning human motion representations with CLIP’s text and image embeddings, MotionCLIP enables text-to-motion generation, semantic editing, and interpolation. Its capability to interpret abstract prompts is exemplified by its ability to generate a sitting motion from the prompt "couch" or mimic a web-swinging motion from "Spiderman". These results demonstrate MotionCLIP’s potential to create nuanced animations and expand the creative toolkit for animators and novices alike. Next, the Motion Diffusion Model (MDM) introduces diffusion processes into motion synthesis, addressing the diversity and many-to-many mapping inherent in human motion. MDM combines a lightweight transformer architecture with geometric losses to ensure physically plausible and visually coherent results. It excels in tasks like text-to-motion and action-to-motion, offering state-of-the-art performance on benchmarks while requiring modest computational resources. MDM’s versatility is further demonstrated in inpainting tasks, such as filling gaps in motion sequences or editing specific body parts while preserving the rest of the animation. Extending the utility of diffusion models, Human Motion Diffusion as a Generative Prior explores advanced composition techniques for motion generation via different types of composition. Sequential composition enables the synthesis of long, coherent animations by stitching shorter segments, while parallel composition allows the generation of multi-character interactions using a lightweight communication block. Model composition offers fine-grained control, blending priors to edit and refine joint-level motion trajectories. These methods highlight how generative priors can support complex and nuanced motion applications, addressing previously unmet needs in the field. Finally, we suggest integrating data-driven motion generation into physics simulation through CLoSD, a framework that combines motion diffusion models with reinforcement learning (RL). Acting as a universal planner, the diffusion module generates text-driven motion plans, while the RL controller ensures physical plausibility and interaction with the environment. This synergy enables characters to perform a variety of tasks, from navigating to a goal to object interactions and transitioning between actions like sitting and standing. CLoSD thus bridges the gap between intuitive control and physical realism, opening new horizons for interactive motion generation. By addressing the inherent challenges of motion synthesis using neural generative methods, this work influenced how motion is created and controlled. Its contributions lay a groundwork for intuitive, democratized tools that will potentially empower professionals and novices to produce rich, realistic animations.

Browse

Recent Submissions

Results Per Page

Sort Options