Skip to main content

Automatic transliteration from Arabic to English and its impact on machine translation

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2007
Authors/Contributors
Abstract
Transcribing named entities from one language into another is called transliteration. This thesis proposes a novel spelling-based method for the automatic transliteration of named entities from Arabic to English which exploits various types of letter-based alignments. The approach consists of three phases: the first phase uses single letter alignments, the second phase uses alignments over groups of letters to deal with diacritics and missing vowels in the English output, and the third phase exploits various knowledge sources to repair any remaining errors. The results show a top-20 accuracy rate of up to 88%. Our algorithm is examined in the context of a machine translation task. We provide an in-depth analysis of the integration of our Arabic-to-English transliteration system into a general-purpose phrase-based statistical machine translation system. Our experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities.
Document
Copyright statement
Copyright is held by the author.
Permissions
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.
Scholarly level
Language
English
Member of collection
Download file Size
etd3203.pdf 771.98 KB

Views & downloads - as of June 2023

Views: 50
Downloads: 1