Sampson, Rylen

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2023-11-06

Authors/Contributors

Author: Sampson, Rylen

Abstract

Data-to-text generation, a subfield of natural language generation, increases the usability of structured data and knowledge bases. However, data-to-text generation datasets are not readily available in most domains and those that exist are arduously small. One solution is to include more data, though usually not a straightforward option. Alternatively, data augmentation consists of strategies which artificially enlarge the training data by incorporating slightly varied copies of the original data in order to diversify a dataset that is otherwise lacking. This work investigates augmentation as a remedy for training data-to-text generation models on small datasets. Natural language generation metrics are used to assess the quality of the generated text with and without augmentation. Experiments demonstrated that, with augmentation, models achieved equal performance despite the generated text exhibiting different properties. This suggests that data augmentation can be a useful step in training data-to-text generation models with limited data.

Extent

64 pages.

Keywords

Identifier

etd22796

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Popowich, Fred

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22796.pdf	659.86 KB

Data augmentation for text generation from structured data

Keywords

Views & downloads - as of June 2023