Skip to main content

Advanced techniques for bounded and unbounded repetition in parabix regular expression search

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2017-09-12
Authors/Contributors
Author: Xue, Dong
Abstract
The three-level architecture of the regular expression search tool named icGrep, which is based on a pure parallel Parabix framework, has shown great speedup compared to conventional search tools. This thesis proposed some advanced techniques for bounded and unbounded repetition in Parabix regular expression search. We first accelerated the bounded repetition type of Unicode unit-length regular expressions by utilizing a log2 technique with the UTF8-to-UTF32 pipeline. To reduce the overhead brought about by the UTF-8-to-UTF-32 transformation, the multiplexed character classes concept was proposed. For the unbounded repetition part, we have reviewed finite automata theory for application to Parabix regular expression matching and proposed a totally different compile pipeline for the local language. In the meanwhile, we proposed star-normal-form optimization to make the RE abstract syntax tree less complex and less ambiguous. All of these innovative techniques have demonstrated their performance dealing with repetition against the basic pipeline.
Document
Identifier
etd10390
Copyright statement
Copyright is held by the author.
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Cameron, Robert D.
Member of collection
Download file Size
etd10390_DXUE.pdf 1.81 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0