Ensembles of diverse clustering-based discriminative dependency parsers

Date created: 
Natural language processing, discriminative dependency parsing, clustering algorithms, ensemble learning

Syntactic parsing and dependency parsing in particular are a core component of many Natural Language Processing (NLP) tasks and applications. Improvements in dependency parsing can help improve machine translation and information extraction applications among many others. In this thesis, we extend the framework of (Koo, Carreras, and Collins, 2008) for dependency parsing which uses a single clustering method for semi-supervised learning. We make use of multiple diverse clustering methods to build multiple discriminative dependency parsing models in the Maximum Spanning Tree (MST) parsing framework (McDonald, Crammer, and Pereira, 2005). All of these diverse clustering-based parsers are then combined together using a novel ensemble model, which performs exact inference on the shared hypothesis space of all the parser models. We show that diverse clustering-based parser models and the ensemble method together significantly improves unlabeled dependency accuracy from 90.82% to 92.46% on Section 23 of the Penn Treebank. We also show significant improvements in domain adaptation to the Switchboard and Brown corpora.

Document type: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
Senior supervisor: 
Anoop Sarkar
Applied Science: School of Computing Science
Thesis type: 
(Thesis) M.Sc.