Compressing social networks can substantially facilitate mining and advanced analysis of large social networks. Preferably, social networks should be compressed in a way that they still can be queried efficiently without decompression. Arguably, neighbor queries, which search for all neighbors of a query vertex, are the most essential operations on social networks. In this thesis we develop a social network compression framework, which allows efficient in-neighbor and out-neighbor query answering without replicating the data. As a simple setting of the framework, we introduce the novel Eulerian data structure and prove an upper bound on its bits-per-edge rate. Using the notion of k-linearization, a lossless compression scheme is introduced as a practical compression scheme. The experiments on a dozen real world social networks confirm the effectiveness of our design. Finally, we use our framework to develop a lossy compresson scheme, which aims at capturing the community structure of a given network using a predefined amount of storage. We affirmatively evaluate the lossy compression scheme on both synthesis and real world social networks.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection