Automatic generation of multilingual sports summaries

Date created: 
2011-05-31
Identifier: 
etd6666
Keywords: 
Natural Language Processing
Natural Language Generation
Bangla
Template
Pipeline
Abstract: 

Natural Language Generation is a subfield of Natural Language Processing, which is concerned with automatically creating human readable text from non-linguistic forms of information. A template-based approach to Natural Language Generation utilizes base formats for different types of sentences, which are subsequently transformed to create the final readable forms of the output. In this thesis, we investigate the suitability of a template-based approach to multilingual Natural Language Generation of sports summaries. We implement a system to generate English and Bangla summaries making use of a pipelined architecture to transform data in multiple stages. Additionally, we demonstrate how the automatically generated summaries differ from human generated summaries. We show that by using a template-based approach the system can generate acceptable output in multiple languages without requiring detailed grammatical knowledge, which is important for languages such as Bangla where computational resources are still scarce.

Document type: 
Thesis
Rights: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
File(s): 
Supervisor(s): 
Fred Popowich
Department: 
Applied Science: School of Computing Science
Thesis type: 
((Computing Science) Thesis) M.Sc.
Statistics: