Uncertain data has been rapidly accumulated in many important applications, such as sensor networks, market analysis, social networks, and so on. Analyzing large collections of uncertain data has become an essential task. Generally, uncertainty means the lack of certainty due to having limited knowledge of the data being examined. An uncertain object cannot be described exactly in one state. Instead, it has more than one possible representation. Therefore, we model an uncertain data set as a set of uncertain objects, each of which has a set of instances, in a domain consisting of multiple attributes. In this thesis, we put emphasis on summarizing certainty in uncertain data. We systematically identify three types of uncertainty, namely, value uncertainty, membership uncertainty, and relationship uncertainty in the levels of objects, instances, and domains of uncertain data. In particular, we develop techniques for clustering uncertain objects to summarize objects, detecting outlying instances to summarize instances, and learning domain orders to summarize domains. Technically, we combine statistical analysis and data mining techquies to investigate uncertain data. We develop efficient and scalable algorithms to tackle the computational challenges of large uncertain data sets. We also conduct comprehensive empirical studies on real and synthetic data sets to verify the effectiveness of the proposed summarization techniques and the efficiency of our algorithms.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection