Streaming text processing is important for transforming and analyzing the rapidly growing data in modern society. Unfortunately, text processing software written using the sequential byte-at-a-time processing model fails to take full advantage of the resources available on modern processors for many reasons, including significant branch misprediction penalties due to the input-dependent branching structures of text processing applications, cache miss penalties for table-based operations applied per byte of input, and logical complexity that makes it difficult to process more than a single-byte at a time. Common solutions to process text streams that have dependencies from start to end may involve state machine or recursive algorithms, which are generally considered hard to parallelize and hence ill-suited for multicore or manycore processors. However, the Parabix approach to text processing has recently been shown to offer a promising alternative, based on the concept of bitwise data parallelism: a transform representation of text that uses the full width of available processor registers at a density of one bit per input byte. This dissertation investigates the further development of the Parabix framework to incorporate multidimensional parallelization, combining Parabix methods with several different models of multithreading such as task parallelism, data parallelism and pipeline parallelism as well as with GPU-based SIMT processing. A form of data-pipeline parallelism is developed and shown to be beneficial for text-processing applications even with strong sequential state dependencies. Compilers for both pure pipeline parallelism and data-pipeline parallelism are developed and integrated into Parabix framework to provide automated multithreading support for any Parabix applications. Methods for task parallelism and data parallelism are also developed, but need to be customized by the programmer for specific applications. GPU support is added to Parabix framework by translating LLVM IR into PTX, which can be compiled into binary code and run on GPU devices. Programmers can simply choose to use NVPTX driver instead of CPU driver for code that is executed on GPU. Several applications based on Parabix are implemented and tested with different parallelization techniques to analyze the advantages and limits of multidimensional parallelization extensions of Parabix framework. In data-pipeline mode, we are able to achieve 215% speedup compared with the sequential version on a quad-core machine. The GPU implementation has its limitations but can give up to 310% speed-up.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Member of collection