Properties of prefix lexicalized synchronous grammars

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
Formal language theory
Synchronous grammars
Context-free grammars
Tree-adjoining grammars
Weighted grammars
Greibach normal form

Synchronous grammars, which may be broadly characterized as sets of rules for generating sentence pairs, are used in linguistics and natural language processing (NLP) as a framework for modelling human languages. Prefix lexicalization is a property of some grammars in which each rule contains a terminal symbol (also called a lexical item) at its left edge. This work shows that the class of synchronous context-free grammars (SCFG) is not closed under prefix lexicalization, and presents an algorithm for converting a finitely ambiguous, epsilon-free SCFG to a weakly equivalent prefix lexicalized synchronous tree-adjoining grammar (STAG). This transformation only increases the rank of the grammar by 1; it at most cubes the grammar size, but we provide empirical results showing that the size increase is generally much smaller. We show that this result can also be extended to weighted synchronous grammars without affecting the weights they assign to string pairs. Theoretically, these results serve to generalize extended Greibach normal form from CFGs to SCFGs, and practically they have potential applications in machine translation. They also serve as an avenue for investigating the properties of related grammar formalisms in future work.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Anoop Sarkar
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.