Italian Chapter Conference 2024 - Smart Tools and Apps in Graphics
Permanent URI for this collection
Browse
Browsing Italian Chapter Conference 2024 - Smart Tools and Apps in Graphics by Issue Date
Now showing 1 - 20 of 24
Results Per Page
Sort Options
Item Meshtrics: Objective Quality Assessment of Textured 3D Meshes for 3D Reconstruction(The Eurographics Association, 2024) Madeira, Tiago; Oliveira, Miguel; Dias, Paulo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosIn the context of 3D reconstruction, the pursuit of photorealistic models requires precise, objective quality evaluation methods. In this work, we investigate several potential objective metrics for the quality assessment of textured 3D meshes by evaluating their correlation with human perception of visual quality. We conduct experiments using a publicly available, subjectively-rated database of textured 3D meshes containing various types of geometry and texture distortions. Based on these experiments, we discuss the characteristics and limitations of the evaluated metrics. Notably, image-based metrics demonstrated the strongest correlation with subjective scores in most tested scenarios, suggesting that 2D image metrics are reliable predictors of 3D model visual quality. We then introduce a framework designed to facilitate the analysis of various characteristics of 3D models and their fidelity, with a particular focus on image-based metrics leveraging photographs of real-world environments as reference. Our toolkit streamlines the generation of renders and the application of quality metrics, enabling manual annotation in 2D and 3D spaces, while incorporating an automatic alignment refinement step for precise registration of reference photographs. We evaluate the proposed approach using a dataset generated through the 3D reconstruction of a complex indoor environment. Our experiments support the efficacy of the solution in benchmarking 3D reconstruction results, enabling timely informed adjustments to the reconstruction methodology. Source code is available at https://github.com/tiagomfmadeira/Meshtrics.Item Surface Reconstruction from Silhouette and Laser Scanners as a Positive-Unlabeled Learning Problem(The Eurographics Association, 2024) Gottardo, Mario; Pistellato, Mara; Bergamasco, Filippo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosTypical 3D reconstruction pipelines employ a combination of line-laser scanners and robotic actuators to produce a point cloud and then proceed with surface reconstruction. In this work we propose a new technique to learn an Implicit Neural Representation (INR) of a 3D shape S without directly observing points on its surface. We just assume being able to determine whether a 3D point is exterior to S (e.g. observing if the projection falls outside the silhouette or detecting on which side of the laser line the point is). In this setting, we cast the reconstruction process as a Positive-Unlabelled learning problem where sparse 3D points, sampled according to a distribution depending on the INR's local gradient, have to be classified as being interior or exterior to S. These points, are used to train the INR in an iterative way so that its zero-crossing converges to the boundary of the shape. Preliminary experiments performed on a synthetic dataset demonstrates the advantages of the approach.Item Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions(The Eurographics Association, 2024) D'Orazio, Antonio; Sforza, Davide; Pellacini, Fabio; Masi, Iacopo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosEditing High Dynamic Range (HDR) environment maps using an inverse differentiable rendering architecture is a complex inverse problem due to the sparsity of relevant pixels and the challenges in balancing light sources and background. The pixels illuminating the objects are a small fraction of the total image, leading to noise and convergence issues when the optimization directly involves pixel values. HDR images, with pixel values beyond the typical Standard Dynamic Range (SDR), pose additional challenges. Higher learning rates corrupt the background during optimization, while lower learning rates fail to manipulate light sources. Our work introduces a novel method for editing HDR environment maps using a differentiable rendering, addressing sparsity and variance between values. Instead of introducing strong priors that extract the relevant HDR pixels and separate the light sources, or using tricks such as optimizing the HDR image in the log space, we propose to model the optimized environment map with a new variant of implicit neural representations able to handle HDR images. The neural representation is trained with adversarial perturbations over the weights to ensure smooth changes in the output when it receives gradients from the inverse rendering. In this way, we obtain novel and cheap environment maps without relying on latent spaces of expensive generative models, maintaining the original visual consistency. Experimental results demonstrate the method's effectiveness in reconstructing the desired lighting effects while preserving the fidelity of the map and reflections on objects in the scene. Our approach can pave the way to interesting tasks, such as estimating a new environment map given a rendering with novel light sources, maintaining the initial perceptual features, and enabling brush stroke-based editing of existing environment maps. Our code is publicly available at github.com/OmnAI-Lab/R-SIREN.Item FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose Estimation(The Eurographics Association, 2024) Pöllabauer, Thomas; Pramod, Ashwin; Knauthe, Volker; Wahl, Michael; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos6D object pose estimation involves determining the three-dimensional translation and rotation of an object within a scene and relative to a chosen coordinate system. This problem is of particular interest for many practical applications in industrial tasks such as quality control, bin picking, and robotic manipulation, where both speed and accuracy are critical for real-world deployment. Current models, both classical and deep-learning-based, often struggle with the trade-off between accuracy and latency. Our research focuses on enhancing the speed of a prominent state-of-the-art deep learning model, GDRNPP, while keeping its high accuracy. We employ several techniques to reduce the model size and improve inference time. These techniques include using smaller and quicker backbones, pruning unnecessary parameters, and distillation to transfer knowledge from a large, high-performing model to a smaller, more efficient student model. Our findings demonstrate that the proposed configuration maintains accuracy comparable to the state-of-the-art while significantly improving inference time. This advancement could lead to more efficient and practical applications in various industrial scenarios, thereby enhancing the overall applicability of 6D Object Pose Estimation models in real-world settings.Item Disambiguating Flat Spots in Digital Elevation Models(The Eurographics Association, 2024) Rocca, Luigi; Puppo, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosWe consider Digital Elevation Models (DEMs) encoded as regular grids of discrete elevation data samples. When the terrain's slope is low relative to the dataset's vertical resolution, the DEM may contain flat spots: connected areas where all points share the same elevation. Flat spots can hinder certain analyses, such as topological characterization or drainage network computations. We discuss the application of Morse-Smale theory to grids and the disambiguation of flat spots. Specifically, we show how to characterize the topology of flat spots and symbolically perturb their elevation data to make the DEM compatible with Morse-Smale theory while preserving its topological properties. Our approach applies equivalently to three different surface models derived from the DEM grid: the step model, the bilinear model, and a piecewise-linear model based on the quincunx lattice.Item Advancing Environmental Modeling with Unstructured Meshes: Current Research and Development(The Eurographics Association, 2024) Miola, Marianna; Cabiddu, Daniela; Mortara, Michela; Pittaluga, Simone; Sorgente, Tommaso; Zuccolini, Marino Vetuschi; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosModeling the distribution of environmental variables across spatial domains presents significant challenges. Geostatistics offers a robust set of tools for accurately predicting values and associated uncertainties at unsampled locations, accounting for spatial correlations. However, these tools are often constrained by their reliance on structured domain representations, limiting their flexibility in modeling complex or irregular structures. By exploring the use of unstructured meshes, we can achieve a more efficient and accurate representation of localized phenomena, thereby enhancing our ability to model spatial patterns. Our current efforts are focused on integrating unstructured meshes into the geostatistical modeling pipeline, encompassing everything from mesh generation (and possibly refinement) to their application in stochastic simulation and the segmentation of the domain into regions where the distribution of variables is homogeneous. Preliminary results are promising, demonstrating the potentialities of this innovative approach.Item Peek-a-bot: learning through vision in Unreal Engine(The Eurographics Association, 2024) Pietra, Daniele Della; Garau, Nicola; Conci, Nicola; Granelli, Fabrizio; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosHumans learn to navigate and interact with their surroundings through their senses, particularly vision. Ego-vision has lately become a significant focus in computer vision, enabling neural networks to learn from first-person data effectively, as we humans do. Supervised or self-supervised learning of depth, object location and segmentation maps through deep networks has shown considerable success in recent years. On the other hand, reinforcement learning (RL) has been focusing on learning from different kinds of sensing data, such as rays, collisions, distances, and other types of observations. In this paper, we merge the two approaches, providing a complete pipeline to train reinforcement learning agents inside virtual environments, only relying on vision, eliminating the need for traditional RL observations. We demonstrate that visual stimuli, if encoded by a carefully designed vision encoder, can provide informative observations, thus replacing ray-based approaches and drastically simplifying the reward shaping typical of classical RL. Our method is fully implemented inside Unreal Engine 5, from the realtime inference of visual features to the online training of the agents' behaviour using the Proximal Policy Optimization (PPO) algorithm. To the best of our knowledge, this is the first in-engine solution targeting video games and simulation, enabling game developers to easily train vision-based RL agents without writing a single line of code. All the code, complete experiments and analysis will be available at https://mmlab-cv.github.io/Peek-a-bot/.Item Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence(The Eurographics Association, 2024) Riva, Alessandro; Raganato, Alessandro; Melzi, Simone; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosCurrent data-driven methodologies for point cloud matching demand extensive training time and computational resources, presenting significant challenges for model deployment and application. In the point cloud matching task, recent advancements with an encoder-only Transformer architecture have revealed the emergence of semantically meaningful patterns in the attention heads, particularly resembling Gaussian functions centered on each point of the input shape. In this work, we further investigate this phenomenon by integrating these patterns as fixed attention weights within the attention heads of the Transformer architecture. We evaluate two variants: one utilizing predetermined variance values for the Gaussians, and another where the variance values are treated as learnable parameters. Additionally we analyze the performances on noisy data and explore a possible way to improve robustness to noise. Our findings demonstrate that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization. Furthermore, we conducted an ablation study to identify the specific layers where the infused information is most impactful and to understand the reliance of the network on this information.Item Smart Tools and Applications in Graphics - Eurographics Italian Chapter Conference: Frontmatter(The Eurographics Association, 2024) Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosItem A Simple Improvement to PIP-Net for Medical Image Anomaly Detection(The Eurographics Association, 2024) Kobayashi, Yuki; Yamaguchi, Yasushi; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosThe application of AI technology in domains requiring decision accountability, such as healthcare, has increased the demand for model interpretability. The part-prototype model is a well-established interpretable approach for image recognition, with PIP-Net demonstrating strong classification performance and high interpretability in multiclass classification tasks. However, PIP-Net assumes the presence of class-specific prototypes. This assumption does not hold for tasks like anomaly detection, where no local features are exclusive to the normal class. To address this, we propose an architecture that learns only the scores corresponding to the anomaly class for each prototype. This approach is based on more reasonable assumptions for anomaly detection than PIP-Net and enables concise inference using fewer prototypes. Evaluation of this approach using the MURA dataset, a large dataset of bone X-rays, revealed that the proposed architecture achieved better anomaly detection performance than the original PIP-Net with fewer prototypes.Item Mesh Comparison Using Regular Grids(The Eurographics Association, 2024) Kaye, Patrizia; Ivrissimtzis, Ioannis; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosA symmetric grid-based approach to mesh comparison is proposed, providing intuitive visual results alongside an objective measure of the local differences between meshes. The difference function is defined on the nodes of a regular 3D lattice, making it suitable as input for a variety of analysis algorithms. The visual results are compared and comparable to the Metro tool.Item To What Extent Are Existing Volume Mapping Algorithms Practically Useful?(The Eurographics Association, 2024) Meloni, Federico; Cherchi, Gianmarco; Scateni, Riccardo; Livesu, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosMappings between geometric domains play a crucial role in many algorithms in geometry processing and are heavily used in various applications. Despite the significant progress made in recent years, the challenge of reliably mapping two volumes still needs to be solved to an extent that is satisfactory for practical applications. This paper offers a review of provably robust volume mapping algorithms, evaluating their performances in terms of time, memory and ability to generate a correct result both with exact and inexact numerical models. We have chosen and evaluated the two most advanced methods currently available, using a state-of-the-art benchmark designed specifically for this type of analysis. We are sharing both the statistical results and specific volume mappings with the community, which can be utilized by future algorithms for direct comparative analysis. We also provide utilities for reading, writing, and validating volume maps encoded with exact rational coordinates, which is the natural form of output for robust algorithms in this class. All in all, this benchmark offers a neat overview of where do we stand in terms of ability to reliably solve the volume mapping problem, also providing practical data and tools that enable the community to compare future algorithmic developments without the need to re-run existing methods.Item Disk-NeuralRTI: Optimized NeuralRTI Relighting through Knowledge Distillation(The Eurographics Association, 2024) Dulecha, Tinsae Gebrechristos; Righetto, Leonardo; Pintus, Ruggero; Gobbetti, Enrico; Giachetti, Andrea; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosRelightable images created from Multi-Light Image Collections (MLICs) are among the most employed models for interactive object exploration in cultural heritage (CH). In recent years, neural representations have been shown to produce higherquality images at similar storage costs to the more classic analytical models such as Polynomial Texture Maps (PTM) or Hemispherical Harmonics (HSH). However, the Neural RTI models proposed in the literature perform the image relighting with decoder networks with a high number of parameters, making decoding slower than for classical methods. Despite recent efforts targeting model reduction and multi-resolution adaptive rendering, exploring high-resolution images, especially on high-pixelcount displays, still requires significant resources and is only achievable through progressive rendering in typical setups. In this work, we show how, by using knowledge distillation from an original (teacher) Neural RTI network, it is possible to create a more efficient RTI decoder (student network). We evaluated the performance of the network compression approach on existing RTI relighting benchmarks, including both synthetic and real datasets, and on novel acquisitions of high-resolution images. Experimental results show that we can keep the student prediction close to the teacher with up to 80% parameter reduction and almost ten times faster rendering when embedded in an online viewer.Item A Study on the Use of High Dynamic Range Imaging for Gaussian Splatting Methods: Are 8 bits Enough?(The Eurographics Association, 2024) Piras, Valentina; Bonatti, Amedeo Franco; Maria, Carmelo De; Cignoni, Paolo; Banterle, Francesco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosThe recent rise of Neural Radiance Fields (NeRFs)-like methods has revolutionized high-fidelity scene reconstruction, with 3D Gaussian Splatting (3DGS) standing out for its ability to generate photorealistic images while maintaining fast, efficient rendering. 3DGS delivers high-fidelity representations of complex scenes at any scale (from very small objects to entire cities), accurately capturing geometry, materials, and lighting, while meeting the need for fast and efficient rendering-crucial for applications requiring real-time performance. Although High Dynamic Range (HDR) technology, which enables the capture of comprehensive real-world lighting information, has been used in novel view synthesis, several questions remain unanswered. For example, does HDR improve the overall quality of reconstruction? Are 8 bits enough? Can tone mapped images be a balanced compromise regarding quality and details? To answer such questions, in this work, we study the application of HDR technology on the 3DGS method for acquiring real-world scenes.Item Evaluating AI-based static stereoscopic rendering of indoor panoramic scenes(The Eurographics Association, 2024) Jashari, Sara; Tukur, Muhammad; Boraey, Yehia; Alzubaidi, Mahmood; Pintore, Giovanni; Gobbetti, Enrico; Villanueva, Alberto Jaspe; Schneider, Jens; Fetais, Noora; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosPanoramic imaging has recently become an extensively used technology for the representation and exploration of indoor environments. Panoramic cameras generate omnidirectional images that provide a comprehensive 360-degree view, making them a valuable tool for applications such as virtual tours in real estate, architecture, and cultural heritage. However, constructing truly immersive experiences from panoramic images presents challenges, particularly in generating panoramic stereo pairs that offer consistent depth cues and visual comfort across all viewing directions. Traditional stereo-imaging techniques do not directly apply to spherical panoramic images, requiring complex processing to avoid artifacts that can disrupt immersion. To address these challenges, various imaging and processing technologies have been developed, including multi-camera systems and computational methods that generate stereo images from a single panoramic input. Although effective, these solutions often involve complicated hardware and processing pipelines. Recently, deep learning approaches have emerged, enabling novel view generation from single panoramic images. While these methods show promise, they have not yet been thoroughly evaluated in practical scenarios. This paper presents a series of evaluation experiments aimed at assessing different technologies for creating static stereoscopic environments from omnidirectional imagery, with a focus on 3DOF immersive exploration. A user study was conducted using a WebXR prototype and a Meta Quest 3 headset to quantitatively and qualitatively compare traditional image composition techniques with AI-based methods. Our results indicate that while traditional methods provide a satisfactory level of immersion, AI-based generation is nearing a quality level suitable for deployment in web-based environments.Item Persistent Homology vs. Learning Methods: A Comparative Study in Limited Data Scenarios(The Eurographics Association, 2024) Via, Andrea Di; Via, Roberto Di; Fugacci, Ulderico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosThis exploratory study compares persistent homology methods with traditional machine learning and deep learning techniques for label-efficient classification. We propose pure topological approaches, including persistence thresholding and Bottleneck distance classification, and explore hybrid methods combining persistent homology with machine learning. These are evaluated against conventional machine learning algorithms and deep neural networks on two binary classification tasks: surface crack detection and malaria cell identification. We assess performance across various number of samples per class, ranging from 1 to 500. Our study highlights the efficacy of persistent homology-based methods in low-data scenarios. Using the Bottleneck distance approach, we achieve 95.95% accuracy in crack detection and 93.11% in malaria diagnosis with only one labeled sample per class. These results outperform the best performance from machine learning models, which achieves 69.40% and 39.75% accuracy, respectively, and deep learning models, which attains up to 95.96% in crack detection and 62.72% in malaria diagnosis. This demonstrates the superior performance of topological methods in classification tasks with few labeled data. Hybrid approaches demonstrate enhanced performance as the number of labeled samples increases, effectively leveraging topological features to boost classification accuracy. This study highlights the robustness of topological methods in extracting meaningful features from limited data, offering promising directions for efficient, label-conserving classification strategies. The results underscore the worth of persistent homology, both as a standalone tool and in combination with machine learning, particularly in domains where labeled data scarcity challenges traditional deep learning approaches.Item The use of Virtual Reality in preserving and reactivating immersive audio art installations: the case of Dissonanze Circolari by Roberto Taroni(The Eurographics Association, 2024) Russo, Alessandro; Fayyaz, Nikoo; Franceschini, Andrea; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosInteractive multimedia artworks pose unique challenges for their preservation, such as the obsolescence of original components, software, and playback devices, and other issues related to their interactive and time-based nature. The Centro di Sonologia Computazionale (CSC) of the University of Padova developed the Multilevel Dynamic Preservation (MDP) model, which aims at ensuring the long-term preservation of multimedia artworks by treating them as dynamic objects. Reactivation is a fundamental step for allowing their preservation, and, among various reactivation strategies, Virtual Reality (VR) provides a unique opportunity to recreate the immersive experience while still maintaining the concept of the original artwork. The CSC started to work together with Italian artist Roberto Taroni, a central figure in the experimental scenario, who often combined music and visual arts in his works. This contribution concerns the reactivation in VR of Roberto Taroni's artwork ''Dissonanze Circolari'' from 1999. This installation featured a room with 16 speakers, each one playing a fragment of Beethoven's piano performance, Op.111, executed by different musicians, creating a dissonance-based immersive experience. The reactivation was carried out using the documentation provided by the artist and the audio samples from the original installation. The VR environment was created using the game engine Unreal Engine 5. This reactivation approach allows to maximize access to the artwork, providing new information for curators, scholars, and art enthusiasts.Item S4A: Scalable Spectral Statistical Shape Analysis(The Eurographics Association, 2024) Maccarone, Francesca; Longari, Giorgio; Viganò, Giulio; Peruzzo, Denis; Maggioli, Filippo; Melzi, Simone; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosStatistical shape analysis is a crucial technique for studying deformations within collections of shapes, particularly in the field of Medical Imaging. However, the high density of meshes typically used to represent medical data poses a challenge for standard geometry processing tools due to their limited efficiency. While spectral approaches offer a promising solution by effectively handling high-frequency variations inherent in such data, their scalability is questioned by their need to solve eigendecompositions of large sparse matrices. In this paper, we introduce S4A, a novel and efficient method based on spectral geometry processing, that addresses these issues with a low computational cost. It operates in four stages: (i) establishing correspondences between each pair of shapes in the collection, (ii) defining a common latent space to encode deformations across the entire collection, (iii) computing statistical quantities to identify, highlight, and measure the most representative variations within the collection, and iv) performing information transfer from labeled data to large collections of shapes. Unlike previous methods, S4A provides a highly efficient solution across all stages of the process.We demonstrate the advantages of our approach by comparing its accuracy and computational efficiency to existing pipelines, and by showcasing the comprehensive statistical insights that can be derived from applying our method to a collection of medical data.Item DDD: Deep indoor panoramic Depth estimation with Density maps consistency(The Eurographics Association, 2024) Pintore, Giovanni; Agus, Marco; Signoroni, Alberto; Gobbetti, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosWe introduce a novel deep neural network for rapid and structurally consistent monocular 360◦ depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.Item Semantic Stylization and Shading via Segmentation Atlas utilizing Deep Learning Approaches(The Eurographics Association, 2024) Sinha, Saptarshi Neil; Kühn, Paul Julius; Rojtberg, Pavel; Graf, Holger; Kuijper, Arjan; Weinmann, Michael; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae GebrechristosWe present a novel hybrid approach for semantic stylization of surface materials of 3D models while preserving shading. Based on a hybrid approach that builds on directly applying style transfer on the object surface obtained by learning-based or traditional methods such as 3D scanners or structured light systems, thereby overcoming artifacts like halos, ghosting or lacking quality of the geometric representation produced by other 3D stylization methods. For this purpose, our methods involves (i) the initial generation of a segmentation map parameterized over the object surface inferred based on a deep-learning-based foundation model to guide the stylization and shading of different regions of the 3D model, and (ii) a subsequent 2D style transfer that allows the exchange or stylization of surface materials in high quality. By delivering high-quality semantic perceptive reconstructions in a shorter timeframe than current approaches using manual 3D segmentation and stylization, our approach holds significant potential for various application scenarios including creative design, architecture and cultural heritage.