Resource type
Thesis type
(Thesis) M.Sc.
Date created
2012-04-02
Authors/Contributors
Author: Kim, Youngchan
Abstract
Unlike English or Spanish, which has each word clearly segmented, morphologically rich languages, such as Korean, do not have clear optimal word boundaries for machine translation (MT). Previous work has shown that segmenting such languages by incorporating information available from parallel corpus can improve MT results. In this thesis we show that this can be improved further by segmenting both source and target languages and present improvement in BLEU scores from 3.13 to 3.46 for English-Korean translation.
Document
Identifier
etd7145
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Sarkar, Anoop
Member of collection
Download file | Size |
---|---|
etd7145_YKim.pdf | 2.07 MB |