A high-throughput dependency parser

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2017-10-17
Authors/Contributors
Abstract
Dependency parsing is an important task in NLP, and it is used in many downstream tasks for analyzing the semantic structure of sentences. Analyzing very large corpora in a reasonable amount of time, however, requires a fast parser. In this thesis we develop a transition-based dependency parser with a neural-network decision function which outperforms spaCy, Stanford CoreNLP, and MALTParser in terms of speed while having a comparable, and in some cases better, accuracy. We also develop several variations of our model to investigate the trade-off between accuracy and speed. This leads to a model with a greatly reduced feature set which is much faster but less accurate, as well as a more complex model involving a BiLSTM simultaneously trained to produce POS tags which is more accurate, but much slower. We compare the accuracy and speed of our different parser models against the three mentioned parsers on the Penn Treebank, Universal Dependencies English, and Ontonotes datasets using two different dependency tree representations to show how our parser competes on data from very different domains. Our experimental results reveal that our main model is much faster than the 3 external parsers while also being more accurate in some cases; our reduced feature set model is significantly faster while remaining competitive in terms of accuracy; and our BiLSTM-using model is somewhat faster than CoreNLP and is significantly more accurate.
Document
Identifier
etd10435
Copyright statement
Copyright is held by the author.
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Sarkar, Anoop
Member of collection
Attachment Size
etd10435_AVacariu.pdf 538.42 KB