Summarizing certainty in uncertain data

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2011-05-24
Authors/Contributors
Author: Jiang, Bin
Abstract
Uncertain data has been rapidly accumulated in many important applications, such as sensor networks, market analysis, social networks, and so on. Analyzing large collections of uncertain data has become an essential task. Generally, uncertainty means the lack of certainty due to having limited knowledge of the data being examined. An uncertain object cannot be described exactly in one state. Instead, it has more than one possible representation. Therefore, we model an uncertain data set as a set of uncertain objects, each of which has a set of instances, in a domain consisting of multiple attributes. In this thesis, we put emphasis on summarizing certainty in uncertain data. We systematically identify three types of uncertainty, namely, value uncertainty, membership uncertainty, and relationship uncertainty in the levels of objects, instances, and domains of uncertain data. In particular, we develop techniques for clustering uncertain objects to summarize objects, detecting outlying instances to summarize instances, and learning domain orders to summarize domains. Technically, we combine statistical analysis and data mining techquies to investigate uncertain data. We develop efficient and scalable algorithms to tackle the computational challenges of large uncertain data sets. We also conduct comprehensive empirical studies on real and synthetic data sets to verify the effectiveness of the proposed summarization techniques and the efficiency of our algorithms.
Document
Identifier
etd6646
Copyright statement
Copyright is held by the author.
Permissions
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection
Attachment Size
etd6646_BJiang.pdf 1.52 MB