EGPGV15: Eurographics Symposium on Parallel Graphics and Visualization
Permanent URI for this collection
Browse
Browsing EGPGV15: Eurographics Symposium on Parallel Graphics and Visualization by Subject "Hardware Architecture"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Packet-Oriented Streamline Tracing on Modern SIMD Architectures(The Eurographics Association, 2015) Hentschel, Bernd; Göbbert, Jens Henrik; Klemm, Michael; Springer, Paul; Schnorr, Andrea; Kuhlen, Torsten W.; C. Dachsbacher and P. NavrátilThe advection of integral lines is an important computational kernel in vector field visualization. We investigate how this kernel can profit from vector (SIMD) extensions in modern CPUs. As a baseline, we formulate a streamline tracing algorithm that facilitates auto-vectorization by an optimizing compiler. We analyze this algorithm and propose two different optimizations. Our results show that particle tracing does not per se benefit from SIMD computation. Based on a careful analysis of the auto-vectorized code, we propose an optimized data access routine and a re-packing scheme which increases average SIMD efficiency. We evaluate our approach on three different, turbulent flow fields. Our optimized approaches increase integration performance up to 5:6 over our baseline measurement. We conclude with a discussion of current limitations and aspects for future work.Item TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism(The Eurographics Association, 2015) Grosset, A. V. Pascal; Prasad, Manasa; Christensen, Cameron; Knoll, Aaron; Hansen, Charles; C. Dachsbacher and P. NavrátilModern supercomputers have very powerful multi-core CPUs. The programming model on these supercomputer is switching from pure MPI to MPI for inter-node communication, and shared memory and threads for intra-node communication. Consequently the bottleneck in most systems is no longer computation but communication between nodes. In this paper, we present a new compositing algorithm for hybrid MPI parallelism that focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a direct send stage where nodes are arranged in groups and exchange regions of an image, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting, show strong scaling results and explain how we generally achieve better performance than these two algorithms.