Clifton, Ann

Resource type

Thesis

Thesis type

(Dissertation) Ph.D.

Date created

2015-10-27

Authors/Contributors

Author: Clifton, Ann

Abstract

Natural language is rich with layers of implicit structure, and previous research has shown that we can take advantage of this structure to make more accurate models. Most attempts to utilize forms of implicit natural language structure for natural language processing tasks have assumed a pre-defined structural analysis before training the task-specific model. However, rather than fixing the latent structure, we may wish to discover the latent structure that is most useful via feedback from an extrinsic task. The focus of this thesis is on jointly learning the best latent analysis along with the model for the NLP task we are interested in. In this work, we present a generalized learning framework for discriminative training overjointly learned latent structures, and apply this to several NLP tasks. We develop a high accuracy discriminative language model over shallow parse structures. We demonstrate an efficient algorithm for learning this grammaticality classifier by combining the input of multiple representations of the latent structures. Next, we set forth a framework for latent structure learning for statistical machine translation (SMT), in which the latent segmentation and alignment of the parallel training data inform the translation model. This model jointly optimizes segmentation and alignment for the translation task, novelly learning over latent representations of the input. We also propose a discriminative bilingual topic model over hierarchically structured latent topics, which allows for weighted contributions from more informative inputs and can be optimized for SMT. We apply this model to morphological disambiguation and domain adaptation for SMT. Finally, we give an investigation of large-scale distributed training for structured discriminative models and propose recommendations for distributed computational topologies.

Keywords

Identifier

etd9353

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (PhD)

Supervisor or Senior Supervisor

Thesis advisor: Sarkar, Anoop

Member of collection

Computing Science Theses

Download file	Size
etd9353_AClifton.pdf	794.18 KB

Latent structure discriminative learning for natural language processing

Keywords

Views & downloads - as of June 2023