A clustering approach for the unsupervised recognition of nonliteral language

Date created: 

In this thesis we present TroFi, a system for separating literal and nonliteral usages of verbs through unsupervised statistical word-sense disambiguation and clustering techniques. TroFi distinguishes itself by redefining the types of nonliteral language handled and by depending purely on sentential context rather than selectional constraint violations and paths in semantic hierarchies. TroFi uses literal and nonliteral seed sets acquired and cleaned without human supervision to bootstrap learning. We adapt a word-sense disambiguation algorithm to our task and add learners, a voting schema, SuperTags, and additional context. Detailed experiments on hand-annotated data and the introduction of active learning and iterative augmentation allow us to build the TroFi Example Base, an expandable resource of literal/nonliteral usage clusters for the NLP community. We also describe some other possible applications of TroFi and the TroFi Example Base. Our basic algorithm outperforms the baseline by 24.4%. Adding active learning increases this to over 35%.

The author has placed restrictions on the PDF copy of this thesis. The PDF is not printable nor copyable. If you would like the SFU Library to attempt to contact the author to get permission to print a copy, please email your request to summit-permissions@sfu.ca.
Document type: 
Copyright remains with the author
School of Computing Science - Simon Fraser University
Thesis type: 
(Computing Science) Thesis (M.Sc.)