This thesis investigates the feasibility of constructing a high-performance, Unicode-capable, regular expression search tool by combining parallel bit stream technologies and algorithms together with the dynamic compilation capabilities of the LLVM compiler infrastructure. A prototype implementation of icGREP successfully demonstrates the feasibility of this undertaking, with asymptotic performance fully in line with that predicted by earlier prototyping work. The icGREP implementation extends the Parabix regular expression algorithms to include new techniques for efficient Unicode character matching. Performance evaluations in comparison with other Unicode-capable regular expression search tools show asymptotic performance advantages that are often over 10X, although the overhead of dynamiccompilation techniques confines the benefits to relatively large input files.
Copyright is held by the author.
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Cameron, Robert
Member of collection