Resource type
Thesis type
(Thesis) M.Sc.
Date created
2018-11-26
Authors/Contributors
Author: Ambartsoumian, Artaches
Abstract
Many machine learning tasks are structured as sequence modeling problems, predominantly dealing with text and data with a time dimension. It is thus very important to have a model that is good at capturing both short range and long range dependencies across sequence steps. Many approaches have been used over the past few decades, with various neural network architectures becoming the standard in recent years. The main neural network architecture types that have been applied are recurrent neural networks (RNNs) and convolutional neural neworks (CNNs). In this work, we explore a new type of neural network architecture, self-attention networks (SANs), by testing on sequence modeling tasks of sentiment analysis classification and time-series regression. First we perform a detailed comparison between simple SANs, RNNs, and CNNs on six sentiment analysis datasets, where we demonstrate SANs achieving higher classification accuracy while having other better model characteristics over RNNs such as faster training and inference times, lower number of trainable parameters, and consuming less memory during training. Next we propose a more complex self-attention based architecture called ESSAN and use it to achieve state-of-the-art (SOTA) results on the Stanford Sentiment Treebank fine-grained sentiment analysis dataset. Finally, we apply our ESSAN architectures for the regression task of multivariate time-series prediction. Our preliminary results show that ESSAN once again achieves SOTA results, beating previous SOTA RNN with attention architectures.
Document
Identifier
etd19959
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Popowich, Fred
Member of collection
Download file | Size |
---|---|
etd19959.pdf | 894.08 KB |