Bidirectional segmentation for English-Korean machine translation

Author: 
Date created: 
2012-04-02
Identifier: 
etd7145
Keywords: 
Machine translation
Segmentation
Parallel corpus
Bidirectional
Morphology
Abstract: 

Unlike English or Spanish, which has each word clearly segmented, morphologically rich languages, such as Korean, do not have clear optimal word boundaries for machine translation (MT). Previous work has shown that segmenting such languages by incorporating information available from parallel corpus can improve MT results. In this thesis we show that this can be improved further by segmenting both source and target languages and present improvement in BLEU scores from 3.13 to 3.46 for English-Korean translation.

Document type: 
Thesis
Rights: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Science: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: