Kou, Xinxin

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2017-09-12

Authors/Contributors

Author: Kou, Xinxin

Abstract

Sequence Tagging, including part of speech tagging, chunking and named entity recognition, is an important task in NLP. Recurrent neural network models such as Bidirectional LSTMs have produced impressive results on sequence tagging. In this work, we first present a Bidirectional LSTM neural network model for sequence tagging tasks. Then we show a simple and fast greedy sequence tagging system using a feedforward neural network. We compare the speed and accuracy between the Bidirectional LSTM model and the greedy feedforward model. In addition, we propose two new models based on Mention2Vec by Stratos (2016): Feedforward-Mention2Vec for named entity recognition and chunking, and BPE-Mention2Vec for part-of-speech tagging. Feedforward-Mention2Vec predicts tag boundaries and corresponding types separately. BPE-Mention2Vec uses the Byte Pair Encoding algorithm to segment words first and then predicts the part-of-speech tags for the subword spans. We carefully design the experiments to demonstrate the speed and accuracy trade-off in different models. The empirical results reveal that the greedy feedforward model can achieve comparable accuracy and faster speed than recurrent models for sequence tagging, and Feedforward-Mention2Vec is competitive with the fully structured BiLSTM model for named entity recognition while being more scalable in the number of named entity types.

Keywords

Identifier

etd10400

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (Masters)

Supervisor or Senior Supervisor

Thesis advisor: Sarkar, Anoop

Member of collection

Computing Science Theses

Download file	Size
etd10400_XKou.pdf	506.31 KB

Speed versus accuracy in neural sequence tagging for natural language processing

Keywords

Views & downloads - as of June 2023