High-Performance Graphics 2015
Permanent URI for this collection
Browse
Browsing High-Performance Graphics 2015 by Issue Date
Now showing 1 - 12 of 12
Results Per Page
Sort Options
Item Adaptively Layered Statistical Volumetric Obscurance(ACM Siggraph, 2015) Hendrick, Quintjin; Scandolo, Leonardo; Eisemann, Martin; Eisemann, Elmar; Petrik Clarberg and Elmar EisemannWe accelerate volumetric obscurance, a variant of ambient occlusion, and solve undersampling artifacts, such as banding, noise or blurring, that screen-space techniques traditionally suffer from. We make use of an efficient statistical model to evaluate the occlusion factor in screen-space using a single sample. Overestimations and halos are reduced by an adaptive layering of the visible geometry. Bias at tilted surfaces is avoided by projecting and evaluating the volumetric obscurance in tangent space of each surface point. We compare our approach to several traditional screen-space ambient obscurance techniques and show its competitive qualitative and quantitative performance. Our algorithm maps well to graphics hardware, does not require the traditional bilateral blur step of previous approaches, and avoids typical screen-space related artifacts such as temporal instability due to undersampling.Item Deferred Attribute Interpolation for Memory-Efficient Deferred Shading(ACM Siggraph, 2015) Schied, Christoph; Dachsbacher, Carsten; Petrik Clarberg and Elmar EisemannIn this work we present a novel approach to deferred shading suitable for high resolution displays and high visibility sampling rates. We reduce the memory costs of deferred shading by substituting the geometry buffer with a visibility buffer that stores references into a triangle buffer. The triangle buffer is populated dynamically with all visible triangles which is compatible with the use of tessellation. Stored triangles are represented by a sample point and screen-space partial derivatives. This representation allows for efficient attribute interpolation during shading and gives shaders knowledge about the partial derivatives of all attributes. We show that the size of the visibility buffer can be further decreased by storing a linked list of visibility samples per pixel. For high-resolution displays we propose an extension of our algorithm to perform shading at reduced frequency, allowing us to reduce the sampling rate for computationally expensive, but low-frequency signals such as indirect illumination.Item An Adaptive Acceleration Structure for Screen-space Ray Tracing(ACM Siggraph, 2015) Widmer, S.; Pajak, D.; Schulz, A.; Pulli, K.; Kautz, J.; Goesele, M.; Luebke, D.; Petrik Clarberg and Elmar EisemannWe propose an efficient acceleration structure for real-time screenspace ray tracing. The hybrid data structure represents the scene geometry by combining a bounding volume hierarchy with local planar approximations. This enables fast empty space skipping while tracing and yields exact intersection points for the planar approximation. In combination with an occlusion-aware ray traversal our algorithm is capable to quickly trace even multiple depth layers. Compared to prior work, our technique improves the accuracy of the results, is more general, and allows for advanced image transformations, as all pixels can cast rays to arbitrary directions. We demonstrate real-time performance for several applications, including depth-of-field rendering, stereo warping, and screen-space ray traced reflections.Item Bounding Volume Hierarchy Optimization through Agglomerative Treelet Restructuring(ACM Siggraph, 2015) Domingues, Leonardo R.; Pedrini, Helio; Petrik Clarberg and Elmar EisemannIn this paper, we present a new method for building high-quality bounding volume hierarchies (BVHs) on manycore systems. Our method is an extension of the current state-of-the-art on GPU BVH construction, Treelet Restructuring Bounding Volume Hierarchy (TRBVH), and consists of optimizing an already existing tree by rearranging subsets of its nodes using a bottom-up agglomerative clustering approach. We implemented our solution for the NVIDIA Kepler architecture using CUDA and tested it on 16 distinct scenes, most of which are commonly used to evaluate the performance of acceleration structures. We show that our implementation is capable of producing trees whose quality is on par with the ones generated by TRBVH for those scenes, while being about 30% faster to do so.Item Compiling High Performance Recursive Filters(ACM Siggraph, 2015) Chaurasia, Gaurav; Ragan-Kelley, Jonathan; Paris, Sylvain; Drettakis, George; Durand, Frédo; Petrik Clarberg and Elmar EisemannInfinite impulse response (IIR) or recursive filters, are essential for image processing because they turn expensive large-footprint convolutions into operations that have a constant cost per pixel regardless of kernel size. However, their recursive nature constrains the order in which pixels can be computed, severely limiting both parallelism within a filter and memory locality across multiple filters. Prior research has developed algorithms that can compute IIR filters with image tiles. Using a divide-and-recombine strategy inspired by parallel prefix sum, they expose greater parallelism and exploit producer-consumer locality in pipelines of IIR filters over multidimensional images. While the principles are simple, it is hard, given a recursive filter, to derive a corresponding tile-parallel algorithm, and even harder to implement and debug it. We show that parallel and locality-aware implementations of IIR filter pipelines can be obtained through program transformations, which we mechanize through a domain-specific compiler. We show that the composition of a small set of transformations suffices to cover the space of possible strategies. We also demonstrate that the tiled implementations can be automatically scheduled in hardwarespecific manners using a small set of generic heuristics. The programmer specifies the basic recursive filters, and the choice of transformation requires only a few lines of code. Our compiler then generates high-performance implementations that are an order of magnitude faster than standard GPU implementations, and outperform hand tuned tiled implementations of specialized algorithms which require orders of magnitude more programming effort-a few lines of code instead of a few thousand lines per pipeline.Item Decoupled Coverage Anti-Aliasing(ACM Siggraph, 2015) Wang, Yuxiang; Wyman, Chris; He, Yong; Sen, Pradeep; Petrik Clarberg and Elmar EisemannState-of-the-art methods for geometric anti-aliasing in real-time rendering are based on Multi-Sample Anti-Aliasing (MSAA), which samples visibility more than shading to reduce the number of expensive shading calculations. However, for high-quality results the number of visibility samples needs to be large (e.g., 64 samples/pixel), which requires significant memory because visibility samples are usually 24-bit depth values. In this paper, we present Decoupled Coverage Anti-Aliasing (DCAA), which improves upon MSAA by further decoupling coverage from visibility for high-quality geometric anti-aliasing. Our work is based on the previously-explored idea that all fragments at a pixel can be consolidated into a small set of visible surfaces. Although in the past this was only used to reduce the memory footprint of the G-Buffer for deferred shading with MSAA, we leverage this idea to represent each consolidated surface with a 64-bit binary mask for coverage and a single decoupled depth value, thus significantly reducing the overhead for high-quality anti-aliasing. To do this, we introduce new surface merging heuristics and resolve mechanisms to manage the decoupled depth and coverage samples. Our prototype implementation runs in real-time on current graphics hardware, and results in a significant reduction in geometric aliasing with less memory overhead than 8 MSAA for several complex scenes.Item An Incremental Rendering VM(ACM Siggraph, 2015) Haaser, Georg; Steinlechner, Harald; Maierhofer, Stefan; Tobler, Robert F.; Petrik Clarberg and Elmar EisemannWe introduce an incremental rendering layer on top of standard graphics APIs such as OpenGL or DirectX in the form a virtual machine (VM), which efficiently maintains an optimized, compiled representation of arbitrary high-level scene representations at all times. This includes incremental processing of structural changes such as additions and removals of scene parts, as well as in-place updates of scene data. Our approach achieves a significant framerate increase for typical workloads and reasonable performance for high-frequency changes. Processing is performed in running time O( ), where is proportional to the size of the change and the optimized representation has no runtime overhead with respect to the underlying graphics API. This is achieved by tracking and applying all changes as incremental updates to appropriate data structures and by adaptively synthesizing a program of abstract machine code. In a final step this abstract program is incrementally mapped to executable machine code-comparable to what just-in-time compilers do. Our main contributions are (i) an abstract interface for rendering and visualization systems enabling incremental evaluation, (ii) adaptively optimized abstract machine code in the context of stateless graphics commands, and (iii) subsequent adaptive compilation to executable machine code including on-the-fly defragmentation.Item Perception of Highlight Disparity at a Distance in Consumer Head-Mounted Displays(ACM Siggraph, 2015) Toth, Robert; Hasselgren, Jon; Akenine-Möller, Tomas; Petrik Clarberg and Elmar EisemannStereo rendering for 3D displays and for virtual reality headsets provide several visual cues, including convergence angle and highlight disparity. The human visual system interprets these cues to estimate surface properties of the displayed environment. Naïve stereo rendering effectively doubles the computational burden of image synthesis, and thus it is desirable to reuse as many computations as possible between the stereo image pair. Computing a single radiance for a point on a surface, to be used when synthesizing both the left and right images, results in the loss of highlight disparity. Our hypothesis is that absence of highlight disparity does not impair perception of surface properties at larger distances. This is due to an ever decreasing angular difference between the surface and the two view points as distance to the surface is increased. The effect is exacerbated by the limited resolution of consumer head-mounted displays. We verify this hypothesis with a user study and provide rendering guidelines to leverage our findings.Item Reorder Buffer: An Energy-Efficient Multithreading Architecture for Hardware MIMD Ray Traversal(ACM Siggraph, 2015) Lee, Won-Jong; Shin, Youngsam; Hwang, Seok Joong; Kang, Seok; Yoo, Jeong-Joon; Ryu, Soojung; Petrik Clarberg and Elmar EisemannIn this paper, we present an energy- and area-efficient multithreading architecture for Multiple Instruction, Multiple Data (MIMD) ray tracing hardware targeted at low-power devices. Recent ray tracing hardware has predominantly adopted an MIMD approach for efficient parallel traversal of incoherent rays, and supports a multithreading scheme to hide latency and to resolve memory divergence. However, the conventional multithreading scheme has problems such as increased memory cost for thread storage and consumption of additional energy for bypassing threads to the pipeline. Consequently, we propose a new multithreading architecture called Reorder Buffer. Reorder Buffer solves these problems by constituting a dynamic reordering of the rays in the input buffer according to the results of cache accesses. Unlike conventional schemes, Reorder Buffer is cost-effective and energy-efficient because it does not need additional thread memory nor does it consume more energy because it makes use of existing resources. Simulation results show that our architecture is a potentially versatile solution for future ray tracing hardware in low-energy devices because it provides as much as 11.7% better cache utilization and is up to 4.7 times more energy-efficient than the conventional architecture.Item Efficient Ray Tracing of Subdivision Surfaces using Tessellation Caching(ACM Siggraph, 2015) Benthin, Carsten; Woop, Sven; Nießner, Matthias; Selgard, Kai; Wald, Ingo; Petrik Clarberg and Elmar EisemannA common way to ray trace subdivision surfaces is by constructing and traversing spatial hierarchies on top of tessellated input primitives. Unfortunately, tessellating surfaces requires a substantial amount of memory storage, and involves significant construction and memory I/O costs. In this paper, we propose a lazy-build caching scheme to efficiently handle these problems while also exploiting the capabilities of today's many-core architectures. To this end, we lazily tessellate patches only when necessary, and utilize adaptive subdivision to efficiently evaluate the underlying surface representation. The core idea of our approach is a shared lazy evaluation cache, which triggers and maintains the surface tessellation. We combine our caching scheme with SIMD-optimized subdivision primitive evaluation and fast hierarchy construction over the tessellated surface. This allows us to achieve high ray tracing performance in complex scenes, outperforming the state of the art while requiring only a fraction of the memory. In addition, our method stays within a fixed memory budget regardless of the tessellation level, which is essential for many applications such as movie production rendering. Beyond the results of this paper, we have integrated our method into Embree, an open source ray tracing framework, thus making interactive ray tracing of subdivision surfaces publicly available.Item Grid-Free Out-Of-Core Voxelization to Sparse Voxel Octrees on GPU(ACM Siggraph, 2015) Pätzold, Martin; Kolb, Andreas; Petrik Clarberg and Elmar EisemannIn this paper, we present the first grid-free, out-of-core GPU voxelization method. Our method combines efficient parallel triangle voxelization on GPU with out-of-core technologies in order to allow the processing of scenes with large triangle counts at a high resolution. We directly generate the voxelized data in a sparse voxel octree (SVO) representation, without any intermediate grid structure (''grid-free''). We apply triangle preprocessing and avoid atomic operations, thus leading to an optimized balanced GPU workload and efficient parallel triangle processing. Compared to existing out-of-core CPU approaches, we manage a proper handling of voxel attributes, i.e. all triangle attributes contributing to a voxel are accessible when calculating the voxel attribute. We test and compare our approach to state-of-the-art methods and demonstrate its viability in terms of speed, input triangle count, resolution and output quality.Item Morton Integrals for High Speed Geometry Simplification(ACM Siggraph, 2015) Legrand, Hélène; Boubekeur, Tamy; Petrik Clarberg and Elmar EisemannReal time geometry processing has progressively reached a performance level that makes a number of signal-inspired primitives practical for on-line applications scenarios. This often comes through the joint design of operators, data structure and even dedicated hardware. Among the major classes of geometric operators, filtering and super-sampling (via tessellation) have been successfully expressed under high-performance constraints. The subsampling operator i.e., adaptive simplification, remains however a challenging case for non-trivial input models. In this paper, we build a fast geometry simplification algorithm over a new concept : Morton Integrals. By summing up quadric error metric matrices along Morton-ordered surface samples, we can extract concurrently the nodes of an adaptive cut in the so-defined implicit hierarchy, and optimize all simplified vertices in parallel. This approach is inspired by integral images and exploits recent advances in high performance spatial hierarchy construction and traversal. As a result, our GPU implementation can downsample a mesh made of several millions of polygons at interactive rates, while providing better quality than uniform simplification and preserving important salient features. We present results for surface meshes, polygon soups and point clouds, and discuss variations of our approach to account for per-sample attributes and alternatives error metrics.