Hua, Ming

Resource type

Thesis

Thesis type

(Thesis)

Date created

2009

Authors/Contributors

Author: Hua, Ming

Abstract

Uncertain data is inherent in many important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and rapidly increasing amounts of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task. Ranking queries (also known as top-k queries) are often natural and useful in analyzing uncertain data. In this thesis, we study the problem of ranking queries on uncertain data. Specifically, we extend the basic uncertain data model in three directions, including uncertain data streams, probabilistic linkages, and probabilistic graphs, to meet various application needs. Moreover, we develop a series of novel ranking queries on uncertain data at different granularity levels, including selecting the most typical instances within an uncertain object, ranking instances and objects among a set of uncertain objects, and ranking the aggregate sets of uncertain objects. To tackle the challenges on efficiency and scalability, we develop efficient and scalable query evaluation algorithms for the proposed ranking queries. First, we integrate statistical principles and scalable computational techniques to compute exact query results. Second, we develop efficient randomized algorithms to approximate the answers to ranking queries. Third, we propose efficient approximation methods based on the distribution characteristics of query results. A comprehensive empirical study using real and synthetic data sets verifies the effectiveness of the proposed ranking queries and the efficiency of our query evaluation methods.

Copyright statement

Copyright is held by the author.

Language

English

Member of collection

Computing Science Theses

Download file	Size
ETD4827_MHua.pdf	10.56 MB

Ranking queries on uncertain data

Views & downloads - as of June 2023