Resource type
Thesis type
(Thesis) Ph.D.
Date created
2009
Authors/Contributors
Author: Liu, Yudong
Abstract
The predicate-argument structure (PAS) of a natural language sentence is a useful representation that can be used for a deeper analysis of the underlying meaning of the sentence or directly used in various natural language processing (NLP) applications. The task of semantic role labeling (SRL) is to identify the predicate-argument structures and label the relations between the predicate and each of its arguments. Researchers have been studying SRL as a machine learning problem in the past six years, after large-scale semantically annotated corpora such as FrameNet and PropBank were released to the research community. Lexicalized Tree Adjoining Grammars (LTAGs), a tree rewriting formalism, are often a convenient representation for capturing locality of predicate-argument relations. Our work in this thesis is focused on the development and learning of the state of the art discriminative SRL systems with LTAGs. Our contributions to this field include: We apply to the SRL task a variant of the LTAG formalism called LTAG-spinal and the associated LTAG-spinal Treebank (the formalism and the Treebank were created by Libin Shen). Predicate-argument relations that are either implicit or absent from the original Penn Treebank are made explicit and accessible in the LTAG-spinal Treebank, which we show to be a useful resource for SRL. We propose the use of the LTAGs as an important additional source of features for the SRL task. Our experiments show that, compared with the best-known set of features that are used in state of the art SRL systems, LTAG-based features can improve SRL performance significantly. We treat multiple LTAG derivation trees as latent features for SRL and introduce a novel learning framework -- Latent Support Vector Machines (LSVMs) to the SRL task using these latent features. This method significantly outperforms state of the art SRL systems. In addition, we adapt an SRL framework to a real-world ternary relation extraction task in the biomedical domain. Our experiments show that the use of SRL related features significantly improves performance over the system using only shallow word-based features.
Document
Copyright statement
Copyright is held by the author.
Scholarly level
Language
English
Member of collection
Download file | Size |
---|---|
ETD4905.pdf | 1.17 MB |