Towards feature-aware graph processing on the GPU

Thesis type
(Thesis) M.Sc.
Date created
Author: Vaz, Lynus
Unlike traditional graph processing applications, graph-based learning algorithms like Belief Propagation and Multimodal Learning require complex data such as feature vectors and matrices residing on graph vertices and edges, and employ vector/matrix operations on this data. GPU-based high-performance graph processing frameworks utilize clever techniques to mitigate the effect of random global memory accesses arising from irregular graph structure, and also perform efficient load balancing. However, these frameworks are oblivious to algorithm-specific details like the nature of operations involved and the vertex/edge property types used, and hence they end up generating unnecessary random global memory accesses. Moreover, traditional graph processing frameworks often force the user to follow a strict sequence of operations, which does not capture the nuances of different control flows in graph-based learning algorithms. In this thesis, we present Onyx, a feature-aware framework for graph-based learning algorithms on the GPU. Onyx employs a feature-aware processing model where each vertex property is collectively computed by a group of threads. This allows accesses to be coalesced into fewer global memory transactions, improving memory utilization. Onyx also incorporates dynamic vertex activation to perform sparse computations as vertex properties stabilize over time. The user expresses computations in the form of parallel operations on vertex and edge features, providing flexibility for custom control flows that support different kinds of graph-based learning algorithms. To extract high performance, Onyx automatically folds multiple parallel vertex- and edge-feature operations into a single kernel at compile-time. This eliminates the overhead of repeated kernel launches, and permits the use of low-latency shared memory as intermediate storage. We utilize GPU instructions to efficiently perform collaborative operations across vertex and edge features such as normalization, reduction and feature-level change detection. Finally, as feature-aware processing reduces the computation done per thread, we organized the critical path in Onyx as pipelined steps to minimize expensive dependency stalls. Our evaluation shows that Onyx's feature-aware processing decreases the number of atomic transactions and simultaneously increases global load efficiency. Together with change-driven computation this results in up to 20.3x speedup. We also implemented the graph-based learning algorithms on state-of-the-art GPU graph frameworks, and observe that Onyx outperforms them by up to 51.2x.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Vora, Keval
Member of collection
Attachment Size
input_data\21856\etd21329.pdf 1.04 MB