Translation versus language model pre-training objectives for word sense disambiguation

Date created: 
2019-12-16
Identifier: 
etd20681
Keywords: 
Natural Language Processing
Language Models
Machine Translation
Abstract: 

Contextual word representations pre-trained on large text data have advanced state of the art in many tasks in Natural Language Processing. Most recent approaches pre-train such models using a language modeling (LM) objective. In this thesis, we compare and contrast such LM models with the encoder of an encoder-decoder model pre-trained using a machine translation (MT) objective. For certain tasks such as word-sense disambiguation the MT task provides an intuitively better pre-training objective since different senses of a word tend to translate differently into a target language, while word senses might not always need to be distinguished when using an LM objective. Our experimental results on word sense disambiguation provide insight into pre-training objective functions and can help guide future work into large-scale pre-trained models for transfer learning in NLP.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: