EGGH97: SIGGRAPH/Eurographics Workshop on Graphics Hardware 1997
Permanent URI for this collection
Browse
Browsing EGGH97: SIGGRAPH/Eurographics Workshop on Graphics Hardware 1997 by Issue Date
Now showing 1 - 15 of 15
Results Per Page
Sort Options
Item PixelFlow: The Realization(The Eurographics Association, 1997) Eyles, John; Molnar, Steven; Poulton, John; Greer, Trey; Lastra, Anselmo; England, Nick; Westover, Lee; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderPixelFlow is an architecture for high-speed, highly realistic image generation, based on the techniques of object-parallelism and image composition. Its initial architecture was described in [MOLN92]. After development by the original team of researchers at the University of North Carolina, and codevelopment with industry partners, Division Ltd. and Hewlett- Packard, PixelFlow now is a much more capable system than initially conceived and its hardware and software systems have evolved considerably. This paper describes the final realization of PixelFlow, along with hardware and software enhancements heretofore unpublished.Item Towards Real-Time Photorealistic Rendering: Challenges and Solutions(The Eurographics Association, 1997) Schilling, Andreas; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderA growing number of real-time applications need graphics with photorealistic quality, especially in the field of training (virtual operation, driving and flightsimulation), but also in the areas of design or ergonomic research. We take a closer look at main deficiencies of today s real time graphics hardware and present solutions for several of the identified problems in the areas of antialiasing and texture-. bump- and reflection mapping. In the second part of the paper, a new method for antialiasing bump maps is explained in more detail.Item High Quality Rendering Using the Talisman Architecture(The Eurographics Association, 1997) Barkans, Anthony C.; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderCurrently graphics devices that offer both high performance and high quality interactive rendering have been priced at a level that places them out of the reach of the broad number of users that constitutes the massmarket. Because of the cost constraints placed on graphics devices designed for the massmarket, they often trade off image quality in order to get reasonable rendering rates with minimum use of hardware. This approach is not leading to a rapid adoption of true 3D graphics technology for the broadest number of users. The goal of the Talisman initiative is to make 3D graphics truly ubiquitous. This requires that both high performance and high quality interactive rendering be made available at mass-market price points. This means that trading off image quality, as a means to obtain high performance rendering is unacceptable. In this paper it will be shown that high quality rendering is a natural extension of the highperformance rendering architecture embodied in Talisman.Item Characterization of Static 3D Graphics Workloads(The Eurographics Association, 1997) Chiueh, Tzi-cker; Lin, Wei-jen; A. Kaufmann and W. Strasser and S. Molnar and B.-O. Schneider3D graphics transform 3D models into 2D images by simulating the physics of light propagation from the lighting sources, through the objects, and eventually to the eyes. Although specialized graphics hardware engines have been proposed and implemented in the past, and a heated interest in PC-class 3D graphics cards is currently emerging, detailed descriptions and analysis of 3D graphics workloads which graphics hardware design can be based on are almost non-existent. This work takes the first step towards a comprehensive 3D graphics workload characterization by reporting the results of an empirical study using an instrumented software polygonal renderer tested on a wide variety of static 3D graphics models with sufficiently sophisticated geometric and texture properties.Item Design Of A High Performance Volume Visualization System(The Eurographics Association, 1997) Lichtenbelt, Barthold; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderVisualizing three dimensional discrete datasets has been a topic of many research projects and papers in the past decade. We discuss the issues that come up when designing a whole computer system capable of visualizing these datasets in real time. We explain the three way chicken and egg problem and discuss Hewlett- Packard s effort at breaking it with the Voxelator API extensions to OpenGL. We enumerate what a good hardware design should accomplish. We discuss what system issues are important and show how to integrate volume visualization hardware in one of Hewlett-Packard s graphics accelerators, the VISUALIZE-48XP. We show why the Voxelator is an efficient and well designed API by explaining how various existing hardware engines will easily fit into the Voxelator framework.Item Realizing OpenGL: Two Implementations of One Architecture(The Eurographics Association, 1997) Kilgard, Mark J.; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderThe OpenGL Graphics System provides a well-specified, widely accepted dataflow for 3D graphics and imaging. OpenGL is an architecture; an OpenGL-capable computer is a hardware manifestation or implementaion of that architecture. The Onyx2 InfiniteReality and 02 workstations exemplify two very different implementations of OpenGL. The two designs respond to different cost, performance, and capability goals. Common practice is to describe a graphics hardware implementation based on how the hardware itself operates. However, this paper discusses two OpenGL hardware implementations based on how they embody the OpenGL architecture. An important thread throughout is how OpenGL implementations can be designed not merely based on graphics price-performance considerations, but also with consideration of larger system issues such as memory architecture, compression, and video processing. Just as OpenGL is influenced by wider system concerns, OpenGL itself can provide a clarifying influence on system capabilities not conventionally thought of as graphics-related.Item Accommodating Memory Latency In A Low-cost Rasterizer(The Eurographics Association, 1997) Anderson, Bruce; MacAulay, Rob; Stewart, Andy; Whitted, Turner; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderThis paper describes design tradeoffs in a very low cost rasterizer circuit targeted for use in a video game console. The greatest single factor affecting such a design is the character of memory to which the image generator is connected. Low costs generally constrain the memory dedicated to image generation to be a single package with a single set of address and data lines. While overall memory bandwidth determines the upper limit of performance in such a small image generator, memory latency has a far greater effect on the design. The use of Rambus memory provides more than enough aggregate bandwidth for a frame buffer as long as blocks of pixels are moved in each transfer, but its high latency can stall any processor not matched to the memory. The design described here utilizes a long pixel pipeline to match its internal processing latency to the external frame buffer memory latency.Item EM-Cube: An Architecture for Low-Cost Real-Time Volume Rendering(The Eurographics Association, 1997) Osborne, Rändy; Pfister, Hanspeter; Lauer, Hugh; McKenzie, Neil; Gibson, Sarah; Hiatt, Wally; Ohkarni, TakaHide; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderEM-Cube is a VLSI architecture for low-cost, high quality volume rendering at full video frame rates. Derived from the Cube4 architecture developed at SUNY at Stony Brook, EM-Cube computes sample points and gradients on-the-fly to project 3-dimensional volume data onto 2-dimensional images with realistic lighting and shading. A modest rendering system based on EM-Cube consists of a PC1 card with four rendering chips (ASICs), four 64Mbit SDRAMs to hold the volume data, and four SRAMs to capture the rendered image. The performance target for this configuration is to render images from a 256<sup>3</sup> x 16 bit data set at 30 frames/sec. The EM-Cube architecture can be scaled to larger volume data-sets and/or higher frame rates by adding additional ASKS, SDRAMs, and SRAMs. This paper addresses three major challenges encountered developing EM-Cube into a practical product: exploiting the bandwidth inherent in the SDRAMs containing the volume data, keeping the pin-count between adjacent ASICs at a tractable level, and reducing the on-chip storage required to hold the intermediate results of rendering.Item A Ray-Slice-Sweep Volume Rendering Engine(The Eurographics Association, 1997) Bitter, Ingmar; Kaufman, Arie; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderRay-slice-sweeping is a plane sweep algorithm for volume rendering, The compositing buffer sweeps through the volume and combines the accumulated image with the new slice of just-projected voxels. The image combination is guided by sight rays from the view point through every voxel of the new slice. Cube-4L is a volume rendering architecture which employs a ray-slice-sweeping algorithm. It improves the Cube-4 architecture in three ways. First, during perspective projection all voxels of the dataset contribute to the rendering. Second, it computes gradients at the voxel positions which improves accuracy and allows a more compact implementation, Third, Cube-AL has less control overhead than Cube-4.Item Memory Access Patterns of Occlusion-Compatible 3D Image Warping(The Eurographics Association, 1997) Murk, William R.; Bishop, Gary; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderMcMillan and Bishop s 3D image warp can be efficiently implemented by exploiting the coherency of its memory accesses. We analyze this coherency, and present algorithms that take advantage of it. These algorithms traverse the reference image in an occlusion-compatible order, which is an order that can resolve visibility using a painter s algorithm. Required cache sizes are calculated for several one-pass 3D warp algorithms, and we develop a two-pass algorithm which requires a smaller cache size than any of the practical one-pass algorithms. We also show that reference image traversal orders that are occlusion-compatible for continuous images are not always occlusion-compatible when applied to the discrete images used in practice.Item Codesign Of Graphics Hardware Accelerators(The Eurographics Association, 1997) Ewins, Jon P.; L.Watten, Phil; White, Martin; McNeill, Michael D. J.; Lister, Paul F.; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderThe design of a hardware architecture for a computer graphics pipeline requires a thorough understanding of the algorithms involved at each stage, and the implications these algorithms have on the organisation of the pipeline architecture. The choice of algorithm, the flow of pixel data through the pipeline, and bit width precision issues are crucial decisions in the design of new hardware accelerators. Making these decisions correctly requires intensive investigation and experimentation. The use of hardware description languages such as VHDL, allow for sound top down design methodologies, but their effectiveness in such experimental work is limited. This paper discusses the use of software tools as an aid to hardware development and presents applications that demonstrate the possibilities of this approach and the benefits that can be attained from an integrated codesign design environment.Item VIZARD - Visualization Accelerator for Realtime Display(The Eurographics Association, 1997) Knittel, Günter; Straßer, Wolfgang; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderVolume rendering has traditionally been an application for supercomputers, workstation networks or expensive special-purpose hardware. In contrast, this report shows how far we have reached using the other extreme: the low-end PC platform. We have alleviated the mismatch between this demanding application and the limited computational resources of a PC in three ways: several stages in the visualization pipeline are placed into a preprocessing step, the volume rendering algorithm was optimized using a special data compression scheme, and the algorithm has been implemented in hardware as a PCI-compatible coprocessor (lXZ,4RD). These methods give us a frame rate of up to 1OHz for 256 <sup>3</sup> data sets and an acceptable image quality, although the accelerator prototype was built using relatively slow FPGA-technology. In a low-cost environment a coprocessor must not be more expensive than the host itself, and so VIZARD was designed to be manufacturable for a few hundred dollars. The special data compression scheme allows the data set to be placed into the main memory of the PC and eliminates the need for an expensive, separate volume memory. The entire visualization system consists of a portable PC with two built-in accelerator boards. Despite its small size, the system provides perspective raycasting for realtime walk-throughs. Additional features include stereoscopic viewing using shutter glasses and volume animation.Item Heresy: A Virtual Image-Space 3D Rasterization Architecture(The Eurographics Association, 1997) Chiueh, Tzi-cker; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderWith the advent of virtual reality and other visual applications that require photo and cinema realism, 3D graphics hardware has started to enter into the main stream. This paper describes the design and evaluation of a cost-effective highperformance 3D graphics system called Heresy that is based on virtual image-space architecture. Heresy features three novel architectural mechanisms. First, the lazy shading mechanism renders the shading computation effort to be proportional to the screen area but independent of the scene complexity. Second, the speculative Z-buffer hardware allows one-cycle Z-value comparison, as opposed to four cycles in conventional designs. Third, to avoid the intermediate sorting required by virtual image-space rasterization architecture, we develop an innovative display database traversal algorithm that is tailored to given user projection views. With this technique, the sorting-induced delay and extra memory requirements associated with image-order rasterization are completely eliminated. By replicating the Heresypipeline, it is estimated that the overall performance of the system can reach over 1 million Gouraud-shaded and 2D mip-mapped triangles per second at 20 frames/set with 1K x 1K resolution per frame.Item Triangle Scan Conversion using 2D Homogeneous Coordinates(The Eurographics Association, 1997) Olano, Marc; Greer, Trey; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderWe present a new triangle scan conversion algorithm that works entirely in homogeneous coordinates. By using homogeneous coordinates, the algorithm avoids costly clipping tests which make pipelining or hardware implementations of previous scan conversion algorithms difficult. The algorithm handles clipping by the addition of clip edges, without the need to actually split the clipped triangle. Furthermore, the algorithm can render true homogeneous triangles, including external triangles that should pass through infinity with two visible sections. An implementation of the algorithm on Pixel-Planes 5 runs about 33% faster than a similar implementation of the previous algorithm.Item Architectural Implications of Hardware-Accelerated Bucket Rendering on the PC(The Eurographics Association, 1997) Cox, Michael; Bhandari, Narendra; A. Kaufmann and W. Strasser and S. Molnar and B.-O. SchneiderBucket rendering is a technique whereby a scene is sorted into screen-space tiles and each tile is rendered independently in turn. We expect hardware-accelerated bucket rendering to become available on the PC, and in this paper we explore the effect of such accelerators on main memory bandwidth, bus bandwidth to the accelerator, and on increased triangle set-up requirements. The most important impact is due to the fact that in general primitives overlap multiple buckets, which is a direct cause of overhead. In this paper we evaluate bucket rendering that uses the most common algorithm for bucket sorting, one based on screen-aligned primitive bounding boxes. We extend previous techniques for analytically evaluating bounding box overlap of buckets, and together with experimental results use these to evaluate accelerators that may support 32x32 pixel tiles, and those that may support 128x128 pixel tiles. We expect the former to be possible with dense SRAM, the latter to be possible with DRAM embedded in a logic process (embedded DRAM). Our results suggest that embedded DRAM implementations can support bucket rendering with bounding box bucket sorting, but that SRAM implementations will likely be at risk with respect to overall system performance when bounding box bucket sorting is employed. These results suggest the requirement for more precise but still low-overhead bucket sorting algorithms when bucket rendering hardware is constrained to 32 x 32 tiles.