Skip to main content

Data augmentation for text generation from structured data

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Data-to-text generation, a subfield of natural language generation, increases the usability of structured data and knowledge bases. However, data-to-text generation datasets are not readily available in most domains and those that exist are arduously small. One solution is to include more data, though usually not a straightforward option. Alternatively, data augmentation consists of strategies which artificially enlarge the training data by incorporating slightly varied copies of the original data in order to diversify a dataset that is otherwise lacking. This work investigates augmentation as a remedy for training data-to-text generation models on small datasets. Natural language generation metrics are used to assess the quality of the generated text with and without augmentation. Experiments demonstrated that, with augmentation, models achieved equal performance despite the generated text exhibiting different properties. This suggests that data augmentation can be a useful step in training data-to-text generation models with limited data.
64 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Popowich, Fred
Member of collection
Download file Size
etd22796.pdf 659.86 KB

Views & downloads - as of June 2023

Views: 15
Downloads: 3