High-Performance Graphics 2023
Permanent URI for this collection
High-Performance Graphics 2023 CGF 42-8: Frontmatter
[meta data] [files: ]
Bikker, Jacco
;
Gribble, Christiaan
Acceleration Structures
Edge-Friend: Fast and Deterministic Catmull-Clark Subdivision Surfaces
[meta data] [files: ]
Kuth, Bastian
;
Oberberger, Max
;
Chajdas, Matthäus
;
Meyer, Quirin
Primitives, Surfaces, and Appearance Modeling
Sampling Visible GGX Normals with Spherical Caps
[meta data] [files: ]
Dupuy, Jonathan
;
Benyoub, Anis
Primitives, Surfaces, and Appearance Modeling
Real-Time Rendering of Glinty Appearances using Distributed Binomial Laws on Anisotropic Grids
[meta data] [files: ]
Deliot, Thomas
;
Belcour, Laurent
Primitives, Surfaces, and Appearance Modeling
Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail
[meta data] [files: ]
Benthin, Carsten
;
Peters, Christoph
Deep Learning for Graphics
Generative Adversarial Shaders for Real-Time Realism Enhancement
[meta data] [files: ]
Salmi, Arturo
;
Cséfalvay, Szabolcs
;
Imber, James
Distributed and Cloud-Based Rendering
Data Parallel Multi-GPU Path Tracing using Ray Queue Cycling
[meta data] [files: ]
Wald, Ingo
;
Jaros, Milan
;
Zellmann, Stefan
GPU Computing
GPU-Accelerated LOD Generation for Point Clouds
[meta data] [files: ]
Schütz, Markus
;
Kerbl, Bernhard
;
Klaus, Philip
;
Wimmer, Michael
Browse
Recent Submissions
Item High-Performance Graphics 2023 CGF 42-8: Frontmatter(The Eurographics Association and John Wiley & Sons Ltd., 2023) Bikker, Jacco; Gribble, Christiaan; Bikker, Jacco; Gribble, ChristiaanItem Edge-Friend: Fast and Deterministic Catmull-Clark Subdivision Surfaces(The Eurographics Association and John Wiley & Sons Ltd., 2023) Kuth, Bastian; Oberberger, Max; Chajdas, Matthäus; Meyer, Quirin; Bikker, Jacco; Gribble, ChristiaanWe present edge-friend, a data structure for quad meshes with access to neighborhood information required for Catmull-Clark subdivision surface refinement. Edge-friend enables efficient real-time subdivision surface rendering. In particular, the resulting algorithm is deterministic, does not require hardware support for atomic floating-point arithmetic, and is optimized for efficient rendering on GPUs. Edge-friend exploits that after one subdivision step, two edges can be uniquely and implicitly assigned to each quad. Additionally, edge-friend is a compact data structure, adding little overhead. Our algorithm is simple to implement in a single compute shader kernel, and requires minimal synchronization which makes it particularly suited for asynchronous execution. We easily extend our kernel to support relevant Catmull-Clark subdivision surface features, including semi-smooth creases, boundaries, animation and attribute interpolation. In case of topology changes, our data structure requires little preprocessing, making it amendable for a variety of applications, including real-time editing and animations. Our method can process and render billions of triangles per second on modern GPUs. For a sample mesh, our algorithm generates and renders 2.9 million triangles in 0.58ms on an AMD Radeon RX 7900 XTX GPU.Item Sampling Visible GGX Normals with Spherical Caps(The Eurographics Association and John Wiley & Sons Ltd., 2023) Dupuy, Jonathan; Benyoub, Anis; Bikker, Jacco; Gribble, ChristiaanImportance sampling the distribution of visible GGX normals requires sampling those of a hemisphere. In this work, we introduce a novel method for sampling such visible normals. Our method builds upon the insight that a hemispherical mirror reflects parallel light rays uniformly within a solid angle shaped as a spherical cap. This spherical cap has the same apex as the hemispherical mirror, and its aperture given by the angle formed by the orientation of that apex and the direction of incident light rays. Based on this insight, we sample GGX visible normals as halfway vectors between a given incident direction and directions drawn from its associated spherical cap. Our resulting implementation is even simpler than that of Heitz and leads to up to 39% speed-ups in our benchmarks.Item Real-Time Rendering of Glinty Appearances using Distributed Binomial Laws on Anisotropic Grids(The Eurographics Association and John Wiley & Sons Ltd., 2023) Deliot, Thomas; Belcour, Laurent; Bikker, Jacco; Gribble, ChristiaanIn this work, we render in real-time glittery materials caused by discrete flakes on the surface. To achieve this, one has to count the number of flakes reflecting the light towards the camera within every texel covered by a given pixel footprint. To do so, we derive a counting method for arbitrary footprints that, unlike previous work, outputs the correct statistics. We combine this counting method with an anisotropic parameterization of the texture space that reduces the number of texels falling under a pixel footprint. This allows our method to run with both stable performance and 1.5× to 5× faster than the state-of-the-art.Item Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail(The Eurographics Association and John Wiley & Sons Ltd., 2023) Benthin, Carsten; Peters, Christoph; Bikker, Jacco; Gribble, ChristiaanIn recent work, Nanite has demonstrated how to rasterize virtualized micro-poly geometry in real time, thus enabling immense geometric complexity. We present a system that employs similar methods for real-time ray tracing of micro-poly geometry. The geometry is preprocessed in almost the same fashion: Nearby triangles are clustered together and clusters get merged and simplified to obtain hierarchical level of detail (LOD). Then these clusters are compressed and stored in a GPU-friendly data structure. At run time, Nanite selects relevant clusters, decompresses them and immediately rasterizes them. Instead of rasterization, we decompress each selected cluster into a small bounding volume hierarchy (BVH) in the format expected by the ray tracing hardware. Then we build a complete BVH on top of the bounding volumes of these clusters and use it for ray tracing. Our BVH build reaches more than 74% of the attainable peak memory bandwidth and thus it can be done per frame. Since LOD selection happens per frame at the granularity of clusters, all triangles cover a small area in screen space.Item Generative Adversarial Shaders for Real-Time Realism Enhancement(The Eurographics Association and John Wiley & Sons Ltd., 2023) Salmi, Arturo; Cséfalvay, Szabolcs; Imber, James; Bikker, Jacco; Gribble, ChristiaanApplication of realism enhancement methods, particularly in real-time and resource-constrained settings, has been frustrated by the expense of existing methods. These achieve high quality results only at the cost of long runtimes and high bandwidth, memory, and power requirements. We present an efficient alternative: a high-performance, generative shader-based approach that adapts machine learning techniques to real-time applications, even in resource-constrained settings such as embedded and mobile GPUs. The proposed learnable shader pipeline comprises differentiable functions that can be trained in an end-toend manner using an adversarial objective, allowing for faithful reproduction of the appearance of a target image set without manual tuning. The shader pipeline is optimized for highly efficient execution on the target device, providing temporally stable, faster-than-real time results with quality competitive with many neural network-based methods.Item Data Parallel Multi-GPU Path Tracing using Ray Queue Cycling(The Eurographics Association and John Wiley & Sons Ltd., 2023) Wald, Ingo; Jaros, Milan; Zellmann, Stefan; Bikker, Jacco; Gribble, ChristiaanWe propose a novel approach to data-parallel path tracing on single-node/multi-GPU hardware that builds on ray forwarding, but which aims-above all else-at generality and practicability. We do this by avoiding any attempts at reducing the number of traces or forward operations performed, and instead focus on always using all GPUs' aggregate compute and bandwidth to effectively trace each ray on every GPU. We show that-counter-intuitively-this is both feasible and desirable; and that when run on typical data-center/cloud hardware, the resulting framework not only achieves good performance and scalability, but also comes with significantly fewer limitations, assumptions, or preprocessing requirements than existing techniques.Item GPU-Accelerated LOD Generation for Point Clouds(The Eurographics Association and John Wiley & Sons Ltd., 2023) Schütz, Markus; Kerbl, Bernhard; Klaus, Philip; Wimmer, Michael; Bikker, Jacco; Gribble, ChristiaanAbout: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art. Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time. Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 - an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory. Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.