SFU Search
Contextual word representations pre-trained on large text data have advanced state of the art in many tasks in Natural Language Processing. Most recent approaches pre-train such models using a language modeling (LM) objective. In this thesis, we compare and contrast such LM models with the encoder of an encoder-decoder model pre-trained using a machine translation (MT) objective. For certain tasks such as word-sense disambiguation the MT task provides an intuitively better pre-training objective since different senses of a word tend to translate differently into a target language, while word senses might not always need to be distinguished when using an LM objective. Our experimental results on word sense disambiguation provide insight into pre-training objective functions and can help guide future work into large-scale pre-trained models for transfer learning in NLP.