Triangulation refers to the use of a pivot language when translating from a source language to a target language. Previous research in triangulation has only focused on large corpora in the same domain. This thesis conducts the ﬁrst in-depth study on the use of triangulation for four real-world low-resource languages with realistic data settings, Mawukakan, Maninkakan, Haitian Kreyol and Malagasy, where ﬂuent translations using statistical machine translation are diﬃcult to obtain due to limited amounts of training data in the source-target language pair. We compare and contrast several design choices one needs to consider when using triangulation. We observe that triangulation via French improves translations signiﬁcantly for Mawukakan and Maninkakan, two languages spoken in West Africa. We also improve translations for real-world short messages sent in the aftermath of the Haiti earthquake in 2010 and news articles in Malagasy. As part of the dissertation, we build the ﬁrst eﬀective translation system for the ﬁrst two of these languages and outperform the state-of-the-art for Haitian Kreyol. We improve translation quality by injecting more data via pivot languages and show that in realistic data settings carefully considering triangulation design options is important. Furthermore, in all four languages since the low-resource language pair and pivot language pair data typically come from very diﬀerent domains, we propose a novel iterative method to ﬁne-tune the weighted mixture of direct and pivot based phrase pairs to signiﬁcantly improve translation quality.
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact firstname.lastname@example.org.
Supervisor or Senior Supervisor
Thesis advisor: Sarkar, Anoop
Member of collection