Skip to main content

High-Performance Regular Expression Matching with Parabix and LLVM

Resource type
Thesis type
(Thesis) M.Sc.
Date created
This thesis investigates the feasibility of constructing a high-performance, Unicode-capable, regular expression search tool by combining parallel bit stream technologies and algorithms together with the dynamic compilation capabilities of the LLVM compiler infrastructure. A prototype implementation of icGREP successfully demonstrates the feasibility of this undertaking, with asymptotic performance fully in line with that predicted by earlier prototyping work. The icGREP implementation extends the Parabix regular expression algorithms to include new techniques for efficient Unicode character matching. Performance evaluations in comparison with other Unicode-capable regular expression search tools show asymptotic performance advantages that are often over 10X, although the overhead of dynamiccompilation techniques confines the benefits to relatively large input files.
Copyright statement
Copyright is held by the author.
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Cameron, Robert
Member of collection
Download file Size
etd8659_DDenis.pdf 3.14 MB

Views & downloads - as of June 2023

Views: 30
Downloads: 0