Resource type
Thesis type
(Thesis) Ph.D.
Date created
2011-05-06
Authors/Contributors
Author: Zhou, Bin
Abstract
In recent years, the great success of Web search engines has shown the simplicity and the power of keyword search on billions of textual pages on the Web. In addition to textual pages, there also exist vast collections of structured, semi-structured, and unstructured data in various applications that contain abundant text information. Due to the simplicity and the power of keyword search, it is natural to extend keyword search to retrieve information from large-scale structured, semi-structured, and unstructured data. In this thesis, we study a class of important challenging problems centered on keyword search on large-scale data. We propose various techniques for different types of important data sources, including relational tables, graphs, and search logs. Specifically, for relational tables, we show that, while searching individual tuples using keywords is useful, in some scenarios, it may not find any tuples since keywords may be distributed in different tuples. To enhance the capability of the keyword search technique on relational tables, we develop the aggregate keyword search method which finds aggregate groups of tuples jointly matching a set of query keywords. For graphs, we indicate that keyword queries are often ambiguous. Thus, developing efficient and effective query suggestion techniques is crucial to provide satisfactory user search experience. We extend the query suggestion technique in Web search to help users conduct keyword search on graphs. For search logs, we study various types of keyword search applications in Web search engines, and conclude that all of those applications are related to several novel mining functions on search logs. We build a universal OLAP infrastructure on search logs which supports scalable online query suggestion. The proposed techniques have several desirable characteristics which are useful in different application scenarios. We evaluate our approaches on a broad range of real data sets and synthetic data sets and demonstrate that the techniques can achieve high performance. We also provide an overview of the keyword search problem on large-scale data, survey the literature study in the field, and discuss some potential future research directions.
Document
Identifier
etd6635
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection
Download file | Size |
---|---|
etd6635_BZhou.pdf | 963.98 KB |