EGGH04: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004

Permanent URI for this collection


A Quadrilateral Rendering Primitive

Hormann, Kai
Tarini, Marco

Squeeze: Numerical-Precision-Optimized Volume Rendering

Bitter, Ingmar
Neophytou, Neophytos
Mueller, Klaus
Kaufman, Arie E.

A Hierarchical Shadow Volume Algorithm

Aila, Timo
Akenine-Möller, Tomas

Tile-Based Texture Mapping on Graphics Hardware

Wei, Li-Yi

Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling

Riffel, Andrew
Lefohn, Aaron E.
Vidimce, Kiril
Leone, Mark
Owens, John D.

Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware

Foley, Tim
Houston, Mike
Hanrahan, Pat

A Flexible Simulation Framework for Graphics Architectures

Sheaffer, J. W.
Luebke, D.
Skadron, K.

PixelView: A View-Independent Graphics Rendering Architecture

Stewart, J.
Bennett, E.P.
McMillan, L.

Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip

Schmittler, Jörg
Woop, Sven
Wagner, Daniel
Paul, Wolfgang J.
Slusallek, Philipp

Silhouette Maps for Improved Texture Magnification

Sen, Pradeep

UberFlow: A GPU-Based Particle Engine

Kipfer, Peter
Segal, Mark
Westermann, Rüdiger

Hardware-based Simulation and Collision Detection for Large Particle Systems

Kolb, A.
Latta, L.
Rezk-Salama, C.

A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications

Sohn, Ju-Ho
Woo, Ramchan
Yoo, Hoi-Jun

Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication

Fatahalian, K.
Sugerman, J.
Hanrahan, P.


BibTeX (EGGH04: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004)
@inproceedings{
:10.2312/EGGH/EGGH04/007-014,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
A Quadrilateral Rendering Primitive}},
author = {
Hormann, Kai
and
Tarini, Marco
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/007-014}
}
@inproceedings{
:10.2312/EGGH/EGGH04/025-034,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Squeeze: Numerical-Precision-Optimized Volume Rendering}},
author = {
Bitter, Ingmar
and
Neophytou, Neophytos
and
Mueller, Klaus
and
Kaufman, Arie E.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/025-034}
}
@inproceedings{
:10.2312/EGGH/EGGH04/015-024,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
A Hierarchical Shadow Volume Algorithm}},
author = {
Aila, Timo
and
Akenine-Möller, Tomas
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/015-024}
}
@inproceedings{
:10.2312/EGGH/EGGH04/055-064,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Tile-Based Texture Mapping on Graphics Hardware}},
author = {
Wei, Li-Yi
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/055-064}
}
@inproceedings{
:10.2312/EGGH/EGGH04/035-044,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling}},
author = {
Riffel, Andrew
and
Lefohn, Aaron E.
and
Vidimce, Kiril
and
Leone, Mark
and
Owens, John D.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/035-044}
}
@inproceedings{
:10.2312/EGGH/EGGH04/045-054,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware}},
author = {
Foley, Tim
and
Houston, Mike
and
Hanrahan, Pat
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/045-054}
}
@inproceedings{
:10.2312/EGGH/EGGH04/085-094,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
A Flexible Simulation Framework for Graphics Architectures}},
author = {
Sheaffer, J. W.
and
Luebke, D.
and
Skadron, K.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/085-094}
}
@inproceedings{
:10.2312/EGGH/EGGH04/075-084,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
PixelView: A View-Independent Graphics Rendering Architecture}},
author = {
Stewart, J.
and
Bennett, E.P.
and
McMillan, L.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/075-084}
}
@inproceedings{
:10.2312/EGGH/EGGH04/095-106,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip}},
author = {
Schmittler, Jörg
and
Woop, Sven
and
Wagner, Daniel
and
Paul, Wolfgang J.
and
Slusallek, Philipp
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/095-106}
}
@inproceedings{
:10.2312/EGGH/EGGH04/065-064,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Silhouette Maps for Improved Texture Magnification}},
author = {
Sen, Pradeep
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/065-064}
}
@inproceedings{
:10.2312/EGGH/EGGH04/115-122,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
UberFlow: A GPU-Based Particle Engine}},
author = {
Kipfer, Peter
and
Segal, Mark
and
Westermann, Rüdiger
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/115-122}
}
@inproceedings{
:10.2312/EGGH/EGGH04/123-132,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Hardware-based Simulation and Collision Detection for Large Particle Systems}},
author = {
Kolb, A.
and
Latta, L.
and
Rezk-Salama, C.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/123-132}
}
@inproceedings{
:10.2312/EGGH/EGGH04/107-114,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications}},
author = {
Sohn, Ju-Ho
and
Woo, Ramchan
and
Yoo, Hoi-Jun
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/107-114}
}
@inproceedings{
:10.2312/EGGH/EGGH04/133-138,
booktitle = {
Graphics Hardware},
editor = {
Tomas Akenine-Moeller and Michael McCool
}, title = {{
Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication}},
author = {
Fatahalian, K.
and
Sugerman, J.
and
Hanrahan, P.
}, year = {
2004},
publisher = {
The Eurographics Association},
ISSN = {1727-3471},
ISBN = {3-905673-15-0},
DOI = {
/10.2312/EGGH/EGGH04/133-138}
}

Browse

Recent Submissions

Now showing 1 - 14 of 14
  • Item
    A Quadrilateral Rendering Primitive
    (The Eurographics Association, 2004) Hormann, Kai; Tarini, Marco; Tomas Akenine-Moeller and Michael McCool
    The only surface primitives that are supported by common graphics hardware are triangles and more complex shapes have to be triangulated before being sent to the rasterizer. Even quadrilaterals, which are frequently used in many applications, are rendered as a pair of triangles after splitting them along either diagonal. This creates an undesirable C1-discontinuity that is visible in the shading or texture signal. We propose a new method that overcomes this drawback and is designed to be implemented in hardware as a new rasterizer. It processes a potentially non-planar quadrilateral directly without any splitting and interpolates attributes smoothly inside the quadrilateral. This interpolation is based on a recent generalization of barycentric coordinates that we adapted to handle perspective correction and situations in which a quadrilateral is partially behind the point of view.
  • Item
    Squeeze: Numerical-Precision-Optimized Volume Rendering
    (The Eurographics Association, 2004) Bitter, Ingmar; Neophytou, Neophytos; Mueller, Klaus; Kaufman, Arie E.; Tomas Akenine-Moeller and Michael McCool
    This paper discusses how to squeeze volume rendering into as few bits per operation as possible while still retaining excellent image quality. For each of the typical volume rendering pipeline stages in texture map volume rendering, ray casting and splatting we provide a quantitative analysis of the theoretical and practical limits for the required bit precision for computation and storage. Applying this analysis to any volume rendering implementation can balance the internal precisions based on the desired final output precision and can result in significant speedups and reduced memory footprint.
  • Item
    A Hierarchical Shadow Volume Algorithm
    (The Eurographics Association, 2004) Aila, Timo; Akenine-Möller, Tomas; Tomas Akenine-Moeller and Michael McCool
    The shadow volume algorithm is a popular technique for real-time shadow generation using graphics hardware. Its major disadvantage is that it is inherently fillrate-limited, as the performance is inversely proportional to the area of the projected shadow volumes. We present a new algorithm that reduces the shadow volume rasterization work significantly. With our algorithm, the amount of per-pixel processing becomes proportional to the screenspace length of the visible shadow boundary instead of the projected area. The first stage of the algorithm finds 8×8 pixel tiles, whose 3D bounding boxes are either completely inside or outside the shadow volume. After that, the second stage performs per-pixel computations only for the potential shadow boundary tiles. We outline a twopass implementation, and also describe an efficient single-pass hardware architecture, in which the two stages are separated using a delay stream. The only modification required in applications is a new pair of calls for marking the beginning and end of a shadow volume. In our test scenes, the algorithm processes up to 11.5 times fewer pixels compared to current state-of-the-art methods, while reducing the external video memory bandwidth by a factor of up to 17.1.
  • Item
    Tile-Based Texture Mapping on Graphics Hardware
    (The Eurographics Association, 2004) Wei, Li-Yi; Tomas Akenine-Moeller and Michael McCool
    Texture mapping has been a fundamental feature for commodity graphics hardware. However, a key challenge for texture mapping is how to store and manage large textures on graphics processors. In this paper, we present a tilebased texture mapping algorithm by which we only have to physically store a small set of texture tiles instead of a large texture. Our algorithm generates an arbitrarily large and non-periodic virtual texture map from the small set of stored texture tiles. Because we only have to store a small set of tiles, it minimizes the storage requirement to a small constant, regardless of the size of the virtual texture. In addition, the tiles are generated and packed into a single texture map, so that the hardware filtering of this packed texture map corresponds directly to the filtering of the virtual texture. We implement our algorithm as a fragment program, and demonstrate performance on latest graphics processors.
  • Item
    Mio: Fast Multipass Partitioning via Priority-Based Instruction Scheduling
    (The Eurographics Association, 2004) Riffel, Andrew; Lefohn, Aaron E.; Vidimce, Kiril; Leone, Mark; Owens, John D.; Tomas Akenine-Moeller and Michael McCool
    Real-time graphics hardware continues to offer improved resources for programmable vertex and fragment shaders. However, shader programmers continue to write shaders that require more resources than are available in the hardware. One way to virtualize the resources necessary to run complex shaders is to partition the shaders into multiple rendering passes. This problem, called the Multi-Pass Partitioning Problem (MPP), and a solution for the problem, Recursive Dominator Split (RDS), have been presented by Eric Chan et al. The O(n3) RDS algorithm and its heuristic-based O(n2) cousin, RDSh, are robust in that they can efficiently partition shaders for many architectures with varying resources. However, RDS s high runtime cost and inability to handle multiple outputs per pass make it less desirable for real-time use on today s latest graphics hardware. This paper redefines the MPP as a scheduling problem and uses scheduling algorithms that allow incremental resource estimation and pass computation in O(nlogn) time. Our scheduling algorithm, Mio, is experimentally compared to RDS and shown to have better run-time scaling and produce comparable partitions for emerging hardware architectures.
  • Item
    Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware
    (The Eurographics Association, 2004) Foley, Tim; Houston, Mike; Hanrahan, Pat; Tomas Akenine-Moeller and Michael McCool
    Partitioning fragment shaders into multiple rendering passes is an effective technique for virtualizing shading resource limits in graphics hardware. The Recursive Dominator Split (RDS) algorithm is a polynomial-time algorithm for partitioning fragment shaders for real-time rendering that has been shown to generate efficient partitions. RDS does not, however, work for shaders with multiple outputs, and does not optimize for hardware with support for multiple render targets. We present Merging Recursive Dominator Split (MRDS), an extension of the RDS algorithm to shaders with arbitrary numbers of outputs which can efficiently utilize hardware support for multiple render targets, as well as a new cost metric for evaluating the quality of multipass partitions on modern consumer graphics hardware. We demonstrate that partitions generated by our algorithm execute more efficiently than those generated by RDS alone, and that our cost model is effective in predicting the relative performance of multipass partitions.
  • Item
    A Flexible Simulation Framework for Graphics Architectures
    (The Eurographics Association, 2004) Sheaffer, J. W.; Luebke, D.; Skadron, K.; Tomas Akenine-Moeller and Michael McCool
    In this paper we describe a multipurpose tool for analysis of the performance characteristics of computer graphics hardware and software. We are developing Qsilver, a highly configurable micro-architectural simulator of the GPU that uses the Chromium system's ability to intercept and redirect an OpenGL stream. The simulator produces an annotated trace of graphics commands using Chromium, then runs the trace through a cycle-timer model to evaluate time-dependent behaviors of the various functional units. We demonstrate the use of Qsilver on a simple hypothetical architecture to analyze performance bottlenecks, to explore new GPU microarchitectures, and to model power and leakage properties. One innovation we explore is the use of dynamic voltage scaling across multiple clock domains to achieve significant energy savings at almost negligible performance cost. Finally, we discuss how other architectural features and experiments might be incorporated into the Qsilver framework.
  • Item
    PixelView: A View-Independent Graphics Rendering Architecture
    (The Eurographics Association, 2004) Stewart, J.; Bennett, E.P.; McMillan, L.; Tomas Akenine-Moeller and Michael McCool
    We present a new computer graphics rendering architecture that allows all possible views to be extracted from a single traversal of a scene description. It supports a wide range of rendering primitives, including polygonal meshes, higher-order surface primitives (e.g. spheres, cylinders, and parametric patches), point-based models, and image-based representations. To demonstrate our concept, we have implemented a hardware prototype that includes a 4D, z-buffered frame-buffer supporting dynamic view selection at the time of raster scan-out. As a result, our implementation supports extremely low display-update latency. The PixelView architecture also supports rendering of the same scene for multiple eyes, which provides immediate benefits for stereo viewing methods like those used in today s virtual environments, particularly when there are multiple participants. In the future, view-independent graphics rendering hardware will also be essential to support the multitude of viewpoints required for real-time autostereoscopic and holographic display devices.
  • Item
    Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip
    (The Eurographics Association, 2004) Schmittler, Jörg; Woop, Sven; Wagner, Daniel; Paul, Wolfgang J.; Slusallek, Philipp; Tomas Akenine-Moeller and Michael McCool
    Realtime ray tracing has recently established itself as a possible alternative to the current rasterization approach for interactive 3D graphics. However, the performance of existing software implementations is still severely limited by today's CPUs, requiring many CPUs for achieving realtime performance. In this paper we present a prototype implementation of the full ray tracing pipeline on a single FPGA chip. Running at only 90 MHz it achieves realtime frame rates of 20 to 60 frames per second over a wide range of 3D scenes and includes support for texturing, multiple light sources, and multiple levels of reflection or transparency. A particular interesting feature of the design is the re-use of the transformation unit necessary for supporting dynamic scenes also for other tasks, including efficient ray-triangle intersection as well as shading computations. Despite the additional support for dynamic scenes this approach reduces the overall hardware cost by 68 %. We evaluate the design and its implementation across a wide set of example scenes and demonstrate the benefits of dedicated realtime ray tracing hardware.
  • Item
    Silhouette Maps for Improved Texture Magnification
    (The Eurographics Association, 2004) Sen, Pradeep; Tomas Akenine-Moeller and Michael McCool
    Texture mapping is a simple way of increasing visual realism without adding geometrical complexity. Because it is a discrete process, it is important to properly filter samples when the sampling rate of the texture differs from that of the final image. This is particularly problematic when the texture is magnified or minified. While reasonable approaches exist to tackle the minified case, few options exist for improving the quality of magnified textures in real-time applications. Most simply bilinearly interpolate between samples, yielding exceedingly blurry textures. In this paper, we address the real-time magnification problem by extending the silhouette map algorithm to general texturing. In particular, we discuss the creation of these silmap textures as well as a simple filtering scheme that allows for viewing at all levels of magnification. The technique was implemented on current graphics hardware and our results show that we can achieve a level of visual quality comparable to that of a much larger texture.
  • Item
    UberFlow: A GPU-Based Particle Engine
    (The Eurographics Association, 2004) Kipfer, Peter; Segal, Mark; Westermann, Rüdiger; Tomas Akenine-Moeller and Michael McCool
    We present a system for real-time animation and rendering of large particle sets using GPU computation and memory objects in OpenGL. Memory objects can be used both as containers for geometry data stored on the graphics card and as render targets, providing an effective means for the manipulation and rendering of particle data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform particle manipulation are essential. Our system implements a versatile particle engine, including inter-particle collisions and visibility sorting. By combining memory objects with fioating-point fragment programs, we have implemented a particle engine that entirely avoids the transfer of particle data at run-time. Our system can be seen as a forerunner of a new class of graphics algorithms, exploiting memory objects or similar concepts on upcoming graphics hardware to avoid bus bandwidth becoming the major performance bottleneck.
  • Item
    Hardware-based Simulation and Collision Detection for Large Particle Systems
    (The Eurographics Association, 2004) Kolb, A.; Latta, L.; Rezk-Salama, C.; Tomas Akenine-Moeller and Michael McCool
    Particle systems have long been recognized as an essential building block for detail-rich and lively visual environments. Current implementations can handle up to 10,000 particles in real-time simulations and are mostly limited by the transfer of particle data from the main processor to the graphics hardware (GPU) for rendering. This paper introduces a full GPU implementation using fragment shaders of both the simulation and rendering of a dynamically-growing particle system. Such an implementation can render up to 1 million particles in real-time on recent hardware. The massively parallel simulation handles collision detection and reaction of particles with objects for arbitrary shape. The collision detection is based on depth maps that represent the outer shape of an object. The depth maps store distance values and normal vectors for collision reaction. Using a special texturebased indexing technique to represent normal vectors, standard 8-bit textures can be used to describe the complete depth map data. Alternately, several depth maps can be stored in one floating point texture. In addition, a GPU-based parallel sorting algorithm is introduced that can be used to perform a depth sorting of the particles for correct alpha blending.
  • Item
    A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications
    (The Eurographics Association, 2004) Sohn, Ju-Ho; Woo, Ramchan; Yoo, Hoi-Jun; Tomas Akenine-Moeller and Michael McCool
    The real time 3D graphics becomes one of the attractive applications for 3G wireless terminals although their battery lifetime and memory bandwidth limit the system resources for graphics processing. Instead of using the dedicated hardware engine with complex functions, we propose an efficient hardware architecture of low power vertex shader with programmability. Our architecture includes the following three features: I) a fixed-point SIMD datapath to exploit parallelism in vertex processing while keeping the power consumption low, II) a multithreaded coprocessor interface to decrease unwanted stalls between the main processor and the vertex shader, reducing power consumption by instruction-level power management, III) a programmable vertex engine to increases the datapath throughput by concurrent operations with main processor. Simulation results show that full 3D geometry pipeline can be performed at 7.2M vertices/sec with 115mW power consumption for polygons using the OpenGL lighting model. The improvement is about 10 times greater than that of the latest graphics core with floating-point datapath for wireless applications in terms of processing speed normalized by power consumption, Kvertices/sec per milliwatt.
  • Item
    Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication
    (The Eurographics Association, 2004) Fatahalian, K.; Sugerman, J.; Hanrahan, P.; Tomas Akenine-Moeller and Michael McCool
    Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.