Soleimani Nasab, Mohammad Mahdi

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2016-04-01

Authors/Contributors

Author: Soleimani Nasab, Mohammad Mahdi

Abstract

In many natural language processing (NLP) tasks a large amount of unlabelled data is available while labelled data is hard to attain. Bootstrapping techniques have been shown to be very successful on a variety of NLP tasks using only a small amount of supervision. In this research we have studied different bootstrapping techniques that separate the training step of the algorithm from the decoding step which produces the argmax label on test data. We then explore generative models trained in the conventional way using the EM algorithm but we use an initialization step and a decoding techniques similar to the Yarowsky bootstrapping algorithm. The new approach is tested on the named entity classification and word sense disambiguation tasks and has shown significant improvement over previous generative models.

Keywords

Identifier

etd9579

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (Masters)

Supervisor or Senior Supervisor

Thesis advisor: Sarkar, Anoop

Member of collection

Computing Science Theses

Download file	Size
etd9579_MSoleimaniNasab.pdf	559.69 KB

On the importance of decoding in semi-supervised learning

Keywords

Views & downloads - as of June 2023