EGPGV21: Eurographics Symposium on Parallel Graphics and Visualization
Permanent URI for this collection
Browse
Browsing EGPGV21: Eurographics Symposium on Parallel Graphics and Visualization by Issue Date
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Scalable In Situ Computation of Lagrangian Representations via Local Flow Maps(The Eurographics Association, 2021) Sane, Sudhanshu; Yenpure, Abhishek; Bujack, Roxana; Larsen, Matthew; Moreland, Kenneth; Garth, Christoph; Johnson, Chris R.; Childs, Hank; Larsen, Matthew and Sadlo, FilipIn situ computation of Lagrangian flow maps to enable post hoc time-varying vector field analysis has recently become an active area of research. However, the current literature is largely limited to theoretical settings and lacks a solution to address scalability of the technique in distributed memory. To improve scalability, we propose and evaluate the benefits and limitations of a simple, yet novel, performance optimization. Our proposed optimization is a communication-free model resulting in local Lagrangian flow maps, requiring no message passing or synchronization between processes, intrinsically improving scalability, and thereby reducing overall execution time and alleviating the encumbrance placed on simulation codes from communication overheads. To evaluate our approach, we computed Lagrangian flow maps for four time-varying simulation vector fields and investigated how execution time and reconstruction accuracy are impacted by the number of GPUs per compute node, the total number of compute nodes, particles per rank, and storage intervals. Our study consisted of experiments computing Lagrangian flow maps with up to 67M particle trajectories over 500 cycles and used as many as 2048 GPUs across 512 compute nodes. In all, our study contributes an evaluation of a communication-free model as well as a scalability study of computing distributed Lagrangian flow maps at scale using in situ infrastructure on a modern supercomputer.Item Interactive Selection on Calculated Attributes of Large-Scale Particle Data(The Eurographics Association, 2021) Wollet, Benjamin; Reinhardt, Stefan; Weiskopf, Daniel; Eberhardt, Bernhard; Larsen, Matthew and Sadlo, FilipWe present a GPU-based technique for efficient selection in interactive visualizations of large particle datasets. In particular, we address multiple attributes attached to particles, such as pressure, density, or surface tension. Unfortunately, such intermediate attributes are often available only during the simulation run. They are either not accessible during visualization or have to be saved as additional information along with the usual simulation data. The latter increases the size of the dataset significantly, and the required variables may not be known in advance. Therefore, we choose to compute intermediate attributes on the fly. In this way, we are even able to obtain attributes that were not calculated by the simulation but may be relevant for data analysis or debugging. We present an interactive selection technique designed for such attributes. It leverages spatial regions of the selection to efficiently compute attributes only where needed. This lazy evaluation also works for intelligent and data-driven selection, extending the region to include neighboring particles. Our technique is evaluated by measurements of performance scalability and case studies for typical usage examples.Item Performance Tradeoffs in Shared-memory Platform Portable Implementations of a Stencil Kernel(The Eurographics Association, 2021) Bethel, E. Wes; Heinemann, Colleen; Perciano, Talita; Larsen, Matthew and Sadlo, FilipBuilding on a significant amount of current research that examines the idea of platform-portable parallel code across different types of processor families, this work focuses on two sets of related questions. First, using a performance analysis methodology that leverages multiple metrics including hardware performance counters and elapsed time on both CPU and GPU platforms, we examine the performance differences that arise when using two common platform portable parallel programming approaches, namely OpenMP and VTK-m, for a stencil-based computation, which serves as a proxy for many different types of computations in visualization and analytics. Second, we explore the performance differences that result when using coarserand finer-grained parallelism approaches that are afforded by both OpenMP and VTK-m.Item UnityPIC: Unity Point-Cloud Interactive Core(The Eurographics Association, 2021) Wu, Yaocheng; Vo, Huy; Gong, Jie; Zhu, Zhigang; Larsen, Matthew and Sadlo, FilipIn this work, we present Unity Point-Cloud Interactive Core, a novel interactive point cloud rendering pipeline for the Unity Development Platform. The goal of the proposed pipeline is to expedite the development process for point cloud applications by encapsulating the rendering process as a standalone component, while maintaining flexibility through an implementable interface. The proposed pipeline allows for rendering arbitrarily large point clouds with improved performance and visual quality. First, a novel dynamic batching scheme is proposed to address the adaptive point sizing problem for level-of-detail (LOD) point cloud structures. Then, an approximate rendering algorithm is proposed to reduce overdraw by minimizing the overall number of fragment operations through an intermediate occlusion culling pass. For the purpose of analysis, the visual quality of renderings is quantified and measured by comparing against a high-quality baseline. In the experiments, the proposed pipeline maintains above 90 FPS for a 20 million point budget while achieving greater than 90% visual quality during interaction when rendering a point-cloud with more than 20 billion points.Item PGV 2021: Frontmatter(The Eurographics Association, 2021) Larsen, Matthew; Sadlo, Filip; Larsen, Matthew and Sadlo, FilipItem Machine Learning-Based Autotuning for Parallel Particle Advection(The Eurographics Association, 2021) Schwartz, Samuel D.; Childs, Hank; Pugmire, David; Larsen, Matthew and Sadlo, FilipData-parallel particle advection algorithms contain multiple controls that affect their execution characteristics and performance, in particular how often to communicate and how much work to perform between communications. Unfortunately, the optimal settings for these controls vary based on workload, and, further, it is not easy to devise straight-forward heuristics that automate calculation of these settings. To solve this problem, we investigate a machine learning-based autotuning approach for optimizing data-parallel particle advection. During a pre-processing step, we train multiple machine learning techniques using a corpus of performance data that includes results across a variety of workloads and control settings. The best performing of these techniques is then used to form an oracle, i.e., a module that can determine good algorithm control settings for a given workload immediately before execution begins. To evaluate this approach, we assessed the ability of seven machine learning models to capture particle advection performance behavior and then ran experiments for 108 particle advection workloads on 64 GPUs of a supercomputer. Our findings show that our machine learning-based oracle achieves good speedups relative to the available gains.Item Evaluation of PyTorch as a Data-Parallel Programming API for GPU Volume Rendering(The Eurographics Association, 2021) Marshak, Nathan X.; Grosset, A. V. Pascal; Knoll, Aaron; Ahrens, James; Johnson, Chris R.; Larsen, Matthew and Sadlo, FilipData-parallel programming (DPP) has attracted considerable interest from the visualization community, fostering major software initiatives such as VTK-m. However, there has been relatively little recent investigation of data-parallel APIs in higherlevel languages such as Python, which could help developers sidestep the need for low-level application programming in C++ and CUDA. Moreover, machine learning frameworks exposing data-parallel primitives, such as PyTorch and TensorFlow, have exploded in popularity, making them attractive platforms for parallel visualization and data analysis. In this work, we benchmark data-parallel primitives in PyTorch, and investigate its application to GPU volume rendering using two distinct DPP formulations: a parallel scan and reduce over the entire volume, and repeated application of data-parallel operators to an array of rays. We find that most relevant DPP primitives exhibit performance similar to a native CUDA library. However, our volume rendering implementation reveals that PyTorch is limited in expressiveness when compared to other DPP APIs. Furthermore, while render times are sufficient for an early ''proof of concept'', memory usage acutely limits scalability.Item HyLiPoD: Parallel Particle Advection Via a Hybrid of Lifeline Scheduling and Parallelization-Over-Data(The Eurographics Association, 2021) Binyahib, Roba; Pugmire, David; Childs, Hank; Larsen, Matthew and Sadlo, FilipPerformance characteristics of parallel particle advection algorithms can vary greatly based on workload.With this short paper, we build a new algorithm based on results from a previous bake-off study which evaluated the performance of four algorithms on a variety of workloads. Our algorithm, called HyLiPoD, is a ''meta-algorithm,'' i.e., it considers the desired workload to choose from existing algorithms to maximize performance. To demonstrate HyliPoD's benefit, we analyze results from 162 tests including concurrencies of up to 8192 cores, meshes as large as 34 billion cells, and particle counts as large as 300 million. Our findings demonstrate that HyLiPoD's adaptive approach allows it to match the best performance of existing algorithms across diverse workloads.Item Faster RTX-Accelerated Empty Space Skipping using Triangulated Active Region Boundary Geometry(The Eurographics Association, 2021) Wald, Ingo; Zellmann, Stefan; Morrical, Nate; Larsen, Matthew and Sadlo, FilipWe describe a technique for GPU and RTX accelerated space skipping of structured volumes that improves on prior work by replacing clustered proxy boxes with a GPU-extracted triangle mesh that bounds the active regions. Unlike prior methods, our technique avoids costly clustering operations, significantly reduces data structure construction cost, and incurs less overhead when traversing active regions.