Improvements in hierarchical phrase-based Statistical Machine Translation

Date created: 
Statistical machine translation, Hierarchical phrase-based SMT, Hiero, Bayesian grammar induction, Unified-cascade framework, Multi-metric optimization

Hierarchical phrase-based translation (Hiero) is a statistical machine translation (SMT) model that encodes translation as a synchronous context-free grammar derivation between source and target language strings (Chiang, 2005; Chiang, 2007). Hiero models are more powerful than phrase-based models in capturing complex source-target reordering as well as discontiguous phrases, while being easier to estimate and decode with compared to their full syntax-based counterparts. In this thesis, we propose improvements to two broad aspects of the Hiero translation pipeline: i) learning Hiero translation model and estimating their parameters and ii) parameter tuning for discriminative log-linear models that are used to decode with such features. We use our own open-source implementation of Hiero called Kriya (Sankaran et al., 2012b) for all the experiments in this thesis. This thesis contains the following specific contributions: We propose a Bayesian model for learning Hiero grammars as an alternative to the heuristic method usually used in Hiero. Our model learns a peaked distribution of grammars, which consistently performs better than the heuristically extracted grammars across several language pairs (Sankaran et al., 2013a). We propose a novel unified-cascade framework for jointly learning alignments and the Hiero translation rules by removing the disconnect between the alignments and extracted synchronous context-free grammar. This is the first time a joint training framework is being proposed for Hiero, where we iterate the two step inference so that it learns in alternate iterations the phrase alignments and then the Hiero rules that are consistent with alignments. We extend our Bayesian model for extracting compact Hiero translation rules using arity-1 grammars, resulting in up to 57% reduction in model size while retaining the translation performance (Sankaran et al., 2011; Sankaran et al., 2012a). We propose several novel approaches for parameter tuning of discriminative log-linear models for SMT which can be used for jointly optimizing towards multiple evaluation metrics. We show that our methods for multi-objective tuning for SMT yield substantial gains in translation quality measured through automatic as well as human evaluations (Sankaran et al., 2013b; Duh et al., 2013).

Document type: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
Senior supervisor: 
Anoop Sarkar
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.