Volume 42 (2023)
Permanent URI for this community
Browse
Browsing Volume 42 (2023) by Issue Date
Now showing 1 - 20 of 243
Results Per Page
Sort Options
Item OaIF: Occlusion‐Aware Implicit Function for Clothed Human Re‐construction(© 2023 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Tan, Yudi; Guan, Boliang; Zhou, Fan; Su, Zhuo; Hauser, Helwig and Alliez, PierreClothed human re‐construction from a monocular image is challenging due to occlusion, depth‐ambiguity and variations of body poses. Recently, shape representation based on an implicit function, compared to explicit representation such as mesh and voxel, is more capable with complex topology of clothed human. This is mainly achieved by using pixel‐aligned features, facilitating implicit function to capture local details. But such methods utilize an identical feature map for all sampled points to get local features, making their models occlusion‐agnostic in the encoding stage. The decoder, as implicit function, only maps features and does not take occlusion into account explicitly. Thus, these methods fail to generalize well in poses with severe self‐occlusion. To address this, we present OaIF to encode local features conditioned in visibility of SMPL vertices. OaIF projects SMPL vertices onto image plane to obtain image features masked by visibility. Vertices features integrated with geometry information of mesh are then feed into a GAT network to encode jointly. We query hybrid features and occlusion factors for points through cross attention and learn occupancy fields for clothed human. The experiments demonstrate that OaIF achieves more robust and accurate re‐construction than the state of the art on both public datasets and wild images.Item Triangle Influence Supersets for Fast Distance Computation(© 2023 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Pujol, Eduard; Chica, Antonio; Hauser, Helwig and Alliez, PierreWe present an acceleration structure to efficiently query the Signed Distance Field (SDF) of volumes represented by triangle meshes. The method is based on a discretization of space. In each node, we store the triangles defining the SDF behaviour in that region. Consequently, we reduce the cost of the nearest triangle search, prioritizing query performance, while avoiding approximations of the field. We propose a method to conservatively compute the set of triangles influencing each node. Given a node, each triangle defines a region of space such that all points inside it are closer to a point in the node than the triangle is. This property is used to build the SDF acceleration structure. We do not need to explicitly compute these regions, which is crucial to the performance of our approach. We prove the correctness of the proposed method and compare it to similar approaches, confirming that our method produces faster query times than other exact methods.Item A Survey of Personalized Interior Design(© 2023 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Wang, Y.T.; Liang, C.; Huai, N.; Chen, J.; Zhang, C.J.; Hauser, Helwig and Alliez, PierreInterior design is the core step of interior decoration, and it determines the overall layout and style of furniture. Traditional interior design is usually laborious and time‐consuming work carried out by professional designers and cannot always meet clients' personalized requirements. With the development of computer graphics, computer vision and machine learning, computer scientists have carried out much fruitful research work in computer‐aided personalized interior design (PID). In general, personalization research in interior design mainly focuses on furniture selection and floor plan preparation. In terms of the former, personalized furniture selection is achieved by selecting furniture that matches the resident's preference and style, while the latter allows the resident to personalize their floor plan design and planning. Finally, the automatic furniture layout task generates a stylistically matched and functionally complete furniture layout result based on the selected furniture and prepared floor plan. Therefore, the main challenge for PID is meeting residents' personalized requirements in terms of both furniture and floor plans. This paper answers the above question by reviewing recent progress in five separate but correlated areas, including furniture style analysis, furniture compatibility prediction, floor plan design, floor plan analysis and automatic furniture layout. For each topic, we review representative methods and compare and discuss their strengths and shortcomings. In addition, we collect and summarize public datasets related to PID and finally discuss its future research directions.Item Reference‐based Screentone Transfer via Pattern Correspondence and Regularization(© 2023 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Li, Zhansheng; Zhao, Nanxuan; Wu, Zongwei; Dai, Yihua; Wang, Junle; Jing, Yanqing; He, Shengfeng; Hauser, Helwig and Alliez, PierreAdding screentone to initial line drawings is a crucial step for manga generation, but is a tedious and human‐laborious task. In this work, we propose a novel data‐driven method aiming to transfer the screentone pattern from a reference manga image. This not only ensures the quality, but also adds controllability to the generated manga results. The reference‐based screentone translation task imposes several unique challenges. Since manga image often contains multiple screentone patterns interweaved with line drawing, as an abstract art, this makes it even more difficult to extract disentangled style code from the reference. Also, finding correspondence for mapping between the reference and the input line drawing without any screentone is hard. As screentone contains many subtle details, how to guarantee the style consistency to the reference remains challenging. To suit our purpose and resolve the above difficulties, we propose a novel Reference‐based Screentone Transfer Network (RSTN). We encode the screentone style through a 1D stylegram. A patch correspondence loss is designed to build a similarity mapping function for guiding the translation. To mitigate the generated artefacts, a pattern regularization loss is introduced in the patch‐level. Through extensive experiments and a user study, we have demonstrated the effectiveness of our proposed model.Item Physics-Informed Neural Corrector for Deformation-based Fluid Control(The Eurographics Association and John Wiley & Sons Ltd., 2023) Tang, Jingwei; Kim, Byungsoo; Azevedo, Vinicius C.; Solenthaler, Barbara; Myszkowski, Karol; Niessner, MatthiasControlling fluid simulations is notoriously difficult due to its high computational cost and the fact that user control inputs can cause unphysical motion. We present an interactive method for deformation-based fluid control. Our method aims at balancing the direct deformations of fluid fields and the preservation of physical characteristics. We train convolutional neural networks with physics-inspired loss functions together with a differentiable fluid simulator, and provide an efficient workflow for flow manipulations at test time. We demonstrate diverse test cases to analyze our carefully designed objectives and show that they lead to physical and eventually visually appealing modifications on edited fluid data.Item HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques(Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Chatzimparmpas, A.; Paulovich, F. V.; Kerren, A.; Hauser, Helwig and Alliez, PierreDespite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real‐world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as an efficient solution. However, the problem is more fundamental, with many works emphasizing the importance of instance hardness. This issue refers to the significance of managing unsafe or potentially noisy instances that are more likely to be misclassified and serve as the root cause of poor classification performance.This paper introduces HardVis, a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Our proposed system assists users in visually comparing different distributions of data types, selecting types of instances based on local characteristics that will later be affected by the active sampling method, and validating which suggestions from undersampling or oversampling techniques are beneficial for the ML model. Additionally, rather than uniformly undersampling/oversampling a specific class, we allow users to find and sample easy and difficult to classify training instances from all classes. Users can explore subsets of data from different perspectives to decide all those parameters, while HardVis keeps track of their steps and evaluates the model's predictive performance in a test set separately. The end result is a well‐balanced data set that boosts the predictive power of the ML model. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case. Finally, we also look at how useful our system is based on feedback we received from ML experts.Item Ferret: Reviewing Tabular Datasets for Manipulation(The Eurographics Association and John Wiley & Sons Ltd., 2023) Lange, Devin; Sahai, Shaurya; Phillips, Jeff M.; Lex, Alexander; Bujack, Roxana; Archambault, Daniel; Schreck, TobiasHow do we ensure the veracity of science? The act of manipulating or fabricating scientifc data has led to many high-profle fraud cases and retractions. Detecting manipulated data, however, is a challenging and time-consuming endeavor. Automated detection methods are limited due to the diversity of data types and manipulation techniques. Furthermore, patterns automatically fagged as suspicious can have reasonable explanations. Instead, we propose a nuanced approach where experts analyze tabular datasets, e.g., as part of the peer-review process, using a guided, interactive visualization approach. In this paper, we present an analysis of how manipulated datasets are created and the artifacts these techniques generate. Based on these fndings, we propose a suite of visualization methods to surface potential irregularities. We have implemented these methods in Ferret, a visualization tool for data forensics work. Ferret makes potential data issues salient and provides guidance on spotting signs of tampering and differentiating them from truthful data.Item Interactive Control over Temporal Consistency while Stylizing Video Streams(The Eurographics Association and John Wiley & Sons Ltd., 2023) Shekhar, Sumit; Reimann, Max; Hilscher, Moritz; Semmo, Amir; Döllner, Jürgen; Trapp, Matthias; Ritschel, Tobias; Weidlich, AndreaImage stylization has seen significant advancement and widespread interest over the years, leading to the development of a multitude of techniques. Extending these stylization techniques, such as Neural Style Transfer (NST), to videos is often achieved by applying them on a per-frame basis. However, per-frame stylization usually lacks temporal consistency, expressed by undesirable flickering artifacts. Most of the existing approaches for enforcing temporal consistency suffer from one or more of the following drawbacks: They (1) are only suitable for a limited range of techniques, (2) do not support online processing as they require the complete video as input, (3) cannot provide consistency for the task of stylization, or (4) do not provide interactive consistency control. Domain-agnostic techniques for temporal consistency aim to eradicate flickering completely but typically disregard aesthetic aspects. For stylization tasks, however, consistency control is an essential requirement as a certain amount of flickering adds to the artistic look and feel. Moreover, making this control interactive is paramount from a usability perspective. To achieve the above requirements, we propose an approach that stylizes video streams in real-time at full HD resolutions while providing interactive consistency control. We develop a lite optical-flow network that operates at 80 Frames per second (FPS) on desktop systems with sufficient accuracy. Further, we employ an adaptive combination of local and global consistency features and enable interactive selection between them. Objective and subjective evaluations demonstrate that our method is superior to state-of-the-art video consistency approaches. maxreimann.github.io/stream-consistencyItem VOLMAP: a Large Scale Benchmark for Volume Mappings to Simple Base Domains(The Eurographics Association and John Wiley & Sons Ltd., 2023) Cherchi, Gianmarco; Livesu, Marco; Memari, Pooran; Solomon, JustinCorrespondences between geometric domains (mappings) are ubiquitous in computer graphics and engineering, both for a variety of downstream applications and as core building blocks for higher level algorithms. In particular, mapping a shape to a convex or star-shaped domain with simple geometry is a fundamental module in existing pipelines for mesh generation, solid texturing, generation of shape correspondences, advanced manufacturing etc. For the case of surfaces, computing such a mapping with guarantees of injectivity is a solved problem. Conversely, robust algorithms for the generation of injective volume mappings to simple polytopes are yet to be found, making this a fundamental open problem in volume mesh processing. VOLMAP is a large scale benchmark aimed to support ongoing research in volume mapping algorithms. The dataset contains 4.7K tetrahedral meshes, whose boundary vertices are mapped to a variety of simple domains, either convex or star-shaped. This data constitutes the input for candidate algorithms, which are then required to position interior vertices in the domain to obtain a volume map. Overall, this yields more than 22K alternative test cases. VOLMAP also comprises tools to process this data, analyze the resulting maps, and extend the dataset with new meshes, boundary maps and base domains. This article provides a brief overview of the field, discussing its importance and the lack of effective techniques. We then introduce both the dataset and its major features. An example of comparative analysis between two existing methods is also present.Item Stochastic Subsets for BVH Construction(The Eurographics Association and John Wiley & Sons Ltd., 2023) Tessari, Lorenzo; Dittebrandt, Addis; Doyle, Michael J.; Benthin, Carsten; Myszkowski, Karol; Niessner, MatthiasBVH construction is a critical component of real-time and interactive ray-tracing systems. However, BVH construction can be both compute and bandwidth intensive, especially when a large degree of dynamic geometry is present. Different build algorithms vary substantially in the traversal performance that they produce, making high quality construction algorithms desirable. However, high quality algorithms, such as top-down construction, are typically more expensive, limiting their benefit in real-time and interactive contexts. One particular challenge of high quality top-down construction algorithms is that the large working set at the top of the tree can make constructing these levels bandwidth-intensive, due to O(nlog(n)) complexity, limited cache locality, and less dense compute at these levels. To address this limitation, we propose a novel stochastic approach to GPU BVH construction that selects a representative subset to build the upper levels of the tree. As a second pass, the remaining primitives are clustered around the BVH leaves and further processed into a complete BVH. We show that our novel approach significantly reduces the construction time of top-down GPU BVH builders by a factor up to 1.8x, while achieving competitive rendering performance in most cases, and exceeding the performance in others.Item xOpat: eXplainable Open Pathology Analysis Tool(The Eurographics Association and John Wiley & Sons Ltd., 2023) Horák, Jirí; Furmanová, Katarína; Kozlíková, Barbora; Brázdil, Tomáš; Holub, Petr; Kacenga, Martin; Gallo, Matej; Nenutil, Rudolf; Byška, Jan; Rusnak, Vit; Bujack, Roxana; Archambault, Daniel; Schreck, TobiasHistopathology research quickly evolves thanks to advances in whole slide imaging (WSI) and artificial intelligence (AI). However, existing WSI viewers are tailored either for clinical or research environments, but none suits both. This hinders the adoption of new methods and communication between the researchers and clinicians. The paper presents xOpat, an open-source, browserbased WSI viewer that addresses these problems. xOpat supports various data sources, such as tissue images, pathologists' annotations, or additional data produced by AI models. Furthermore, it provides efficient rendering of multiple data layers, their visual representations, and tools for annotating and presenting findings. Thanks to its modular, protocol-agnostic, and extensible architecture, xOpat can be easily integrated into different environments and thus helps to bridge the gap between research and clinical practice. To demonstrate the utility of xOpat, we present three case studies, one conducted with a developer of AI algorithms for image segmentation and two with a research pathologist.Item A Characterization of Interactive Visual Data Stories With a Spatio‐Temporal Context(© 2023 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Mayer, Benedikt; Steinhauer, Nastasja; Preim, Bernhard; Meuschke, Monique; Hauser, Helwig and Alliez, PierreLarge‐scale issues with a spatial and temporal context such as the COVID‐19 pandemic, the war against Ukraine, and climate change have given visual storytelling with data a lot of attention in online journalism, confirming its high effectiveness and relevance for conveying stories. Thus, new ways have emerged that expand the space of visual storytelling techniques. However, interactive visual data stories with a spatio‐temporal context have not been extensively studied yet. Particularly quantitative information about the used layout and media, the visual storytelling techniques, and the visual encoding of space‐time is relevant to get a deeper understanding of how such stories are commonly built to convey complex information in a comprehensible way. Covering these three aspects, we propose a design space derived by merging and adjusting existing approaches, which we used to categorize 130 collected web‐based visual data stories with a spatio‐temporal context from between 2018 and 2022. An analyzis of the collected data reveals the power of large‐scale issues to shape the landscape of storytelling techniques and a trend towards a simplified consumability of stories. Taken together, our findings can serve story authors as inspiration regarding which storytelling techniques to include in their own spatio‐temporal data stories.Item Don't Peek at My Chart: Privacy-preserving Visualization for Mobile Devices(The Eurographics Association and John Wiley & Sons Ltd., 2023) Zhang, Songheng; Ma, Dong; Wang, Yong; Bujack, Roxana; Archambault, Daniel; Schreck, TobiasData visualizations have been widely used on mobile devices like smartphones for various tasks (e.g., visualizing personal health and financial data), making it convenient for people to view such data anytime and anywhere. However, others nearby can also easily peek at the visualizations, resulting in personal data disclosure. In this paper, we propose a perception-driven approach to transform mobile data visualizations into privacy-preserving ones. Specifically, based on human visual perception, we develop a masking scheme to adjust the spatial frequency and luminance contrast of colored visualizations. The resulting visualization retains its original information in close proximity but reduces visibility when viewed from a certain distance or farther away. We conducted two user studies to inform the design of our approach (N=16) and systematically evaluate its performance (N=18), respectively. The results demonstrate the effectiveness of our approach in terms of privacy preservation for mobile data visualizations.Item HexBox: Interactive Box Modeling of Hexahedral Meshes(The Eurographics Association and John Wiley & Sons Ltd., 2023) Zoccheddu, Francesco; Gobbetti, Enrico; Livesu, Marco; Pietroni, Nico; Cherchi, Gianmarco; Memari, Pooran; Solomon, JustinWe introduce HexBox, an intuitive modeling method and interactive tool for creating and editing hexahedral meshes. Hexbox brings the major and widely validated surface modeling paradigm of surface box modeling into the world of hex meshing. The main idea is to allow the user to box-model a volumetric mesh by primarily modifying its surface through a set of topological and geometric operations. We support, in particular, local and global subdivision, various instantiations of extrusion, removal, and cloning of elements, the creation of non-conformal or conformal grids, as well as shape modifications through vertex positioning, including manual editing, automatic smoothing, or, eventually, projection on an externally-provided target surface. At the core of the efficient implementation of the method is the coherent maintenance, at all steps, of two parallel data structures: a hexahedral mesh representing the topology and geometry of the currently modeled shape, and a directed acyclic graph that connects operation nodes to the affected mesh hexahedra. Operations are realized by exploiting recent advancements in gridbased meshing, such as mixing of 3-refinement, 2-refinement, and face-refinement, and using templated topological bridges to enforce on-the-fly mesh conformity across pairs of adjacent elements. A direct manipulation user interface lets users control all operations. The effectiveness of our tool, released as open source to the community, is demonstrated by modeling several complex shapes hard to realize with competing tools and techniques.Item Novel View Synthesis Of Transparent Object From a Single Image(Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Zhou, Shizhe; Wang, Zezu; Ye, Dongwei; Hauser, Helwig and Alliez, PierreWe propose a method for converting a single image of a transparent object into multi‐view photo that enables users observing the object from multiple new angles, without inputting any 3D shape. The complex light paths formed by refraction and reflection makes it challenging to compute the lighting effects of transparent objects from a new angle. We construct an encoder–decoder network for normal reconstruction and texture extraction, which enables synthesizing novel views of transparent object from a set of new views and new environment maps using only one RGB image. By simultaneously considering the optical transmission and perspective variation, our network learns the characteristics of optical transmission and the change of perspective as guidance to the conversion from RGB colours to surface normals. A texture extraction subnetwork is proposed to alleviate the contour loss phenomenon during normal map generation. We test our method using 3D objects within and without our training data, including real 3D objects that exists in our lab, and completely new environment maps that we take using our phones. The results show that our method performs better on view synthesis of transparent objects in complex scenes using only a single‐view image.Item Multi-scale Iterative Model-guided Unfolding Network for NLOS Reconstruction(The Eurographics Association and John Wiley & Sons Ltd., 2023) Su, Xiongfei; Hong, Yu; Ye, Juntian; Xu, Feihu; Yuan, Xin; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.Non-line-of-sight (NLOS) imaging can reconstruct hidden objects by analyzing diffuse reflection of relay surfaces, and is potentially used in autonomous driving, medical imaging and national defense. Despite the challenges of low signal-to-noise ratio (SNR) and ill-conditioned problem, NLOS imaging has developed rapidly in recent years. While deep neural networks have achieved impressive success in NLOS imaging, most of them lack flexibility when dealing with multiple spatial-temporal resolution and multi-scene images in practical applications. To bridge the gap between learning methods and physical priors, we present a novel end-to-end Multi-scale Iterative Model-guided Unfolding (MIMU), with superior performance and strong flexibility. Furthermore, we overcome the lack of real training data with a general architecture that can be trained in simulation. Unlike existing encoder-decoder architectures and generative adversarial networks, the proposed method allows for only one trained model adaptive for various dimensions, such as various sampling time resolution, various spatial resolution and multiple channels for colorful scenes. Simulation and real-data experiments verify that the proposed method achieves better reconstruction results both in quality and quantity than existing methods.Item Robust Novel View Synthesis with Color Transform Module(The Eurographics Association and John Wiley & Sons Ltd., 2023) Kim, Sang Min; Choi, Changwoon; Heo, Hyeongjun; Kim, Young Min; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.The advancements of the Neural Radiance Field (NeRF) and its variants have demonstrated remarkable capabilities in generating photo-realistic novel views from a small set of input images. While recent works suggest various techniques and model architectures that enhance speed or reconstruction quality, little attention is paid to exploring the RGB color space of input images. In this paper, we propose a universal color transform module that can maximally harness the captured evidence for the neural networks at hand. The color transform module utilizes an encoder-decoder framework that maps the RGB color space into a new latent space, enhancing the expressiveness of the input domain. We attach the encoder and the decoder at the input and output of a NeRF model of choice, respectively, and jointly optimize them to maintain the cycle consistency of the proposed transform, in addition to minimizing the reconstruction errors in the feature domain. Our comprehensive experiments demonstrate that the learned color space can significantly improve the quality of reconstructions compared to the conventional RGB representation. Its benefits are particularly pronounced in challenging scenarios characterized by low-light environments and scenes with low-textured regions. The proposed color transform pushes the boundaries of limitations in the input domain and offers a promising avenue for advancing the reconstruction capabilities of various neural representations. Source code is available at https://github.com/sangminkim-99/ColorTransformModule.Item DAFNet: Generating Diverse Actions for Furniture Interaction by Learning Conditional Pose Distribution(The Eurographics Association and John Wiley & Sons Ltd., 2023) Jin, Taeil; Lee, Sung-Hee; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.We present DAFNet, a novel data-driven framework capable of generating various actions for indoor environment interactions. By taking desired root and upper-body poses as control inputs, DAFNet generates whole-body poses suitable for furniture of various shapes and combinations. To enable the generation of diverse actions, we introduce an action predictor that automatically infers the probabilities of individual action types based on the control input and environment. The action predictor is learned in an unsupervised manner by training Gaussian Mixture Variational Autoencoder (GMVAE). Additionally, we propose a two-part normalizing flow-based pose generator that sequentially generates upper and lower body poses. This two-part model improves motion quality and the accuracy of satisfying conditions over a single model generating the whole body. Our experiments show that DAFNet can create continuous character motion for indoor scene scenarios, and both qualitative and quantitative evaluations demonstrate the effectiveness of our framework.Item Online Avatar Motion Adaptation to Morphologically-similar Spaces(The Eurographics Association and John Wiley & Sons Ltd., 2023) Choi, Soojin; Hong, Seokpyo; Cho, Kyungmin; Kim, Chaelin; Noh, Junyong; Myszkowski, Karol; Niessner, MatthiasIn avatar-mediated telepresence systems, a similar environment is assumed for involved spaces, so that the avatar in a remote space can imitate the user's motion with proper semantic intention performed in a local space. For example, touching on the desk by the user should be reproduced by the avatar in the remote space to correctly convey the intended meaning. It is unlikely, however, that the two involved physical spaces are exactly the same in terms of the size of the room or the locations of the placed objects. Therefore, a naive mapping of the user's joint motion to the avatar will not create the semantically correct motion of the avatar in relation to the remote environment. Existing studies have addressed the problem of retargeting human motions to an avatar for telepresence applications. Few studies, however, have focused on retargeting continuous full-body motions such as locomotion and object interaction motions in a unified manner. In this paper, we propose a novel motion adaptation method that allows to generate the full-body motions of a human-like avatar on-the-fly in the remote space. The proposed method handles locomotion and object interaction motions as well as smooth transitions between them according to given user actions under the condition of a bijective environment mapping between morphologically-similar spaces. Our experiments show the effectiveness of the proposed method in generating plausible and semantically correct full-body motions of an avatar in room-scale space.Item ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech(Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2023) Ghorbani, Saeed; Ferstl, Ylva; Holden, Daniel; Troje, Nikolaus F.; Carbonneau, Marc‐André; Hauser, Helwig and Alliez, PierreWe present ZeroEGGS, a neural network framework for speech‐driven gesture generation with zero‐shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state‐of‐the‐art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high‐quality dataset of full‐body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at .