Data mining for feature vector networks

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Network data arise in various contexts. Examples include online friendship networks, email exchange networks and genetic interaction networks. In this dissertation, we are particularly interested in feature vector networks (FVNs). FVNs are networks wherein the nodes contain additional information in the form of feature vectors. In the case of social networks, the age or economic status of a person might be part of the feature vector. We investigate three research problems related to FVNs: clustering of FVNs, pattern mining in FVNs and simulation model for FVNs. Clustering of nodes in FVNs is relevant, for example, in advertising on online social networks, where feature vectors could reflect recent purchases of users. The number of clusters in these networks is often unknown. We introduce a new approach for simultaneously identifying the number of clusters and the clusters themselves. Our objective function takes into account not only the network structure but also the feature vectors. The outcome of this partitioning clustering algorithm shows better results than current state-of-the-art methods. Our pattern mining approach finds the set of all cohesive patterns. A cohesive pattern is defined as a connected, dense subgraph with similar feature vectors in a sufficiently large subspace. This definition is based on the needs of small communities in social networks and also on modules in Protein-Protein interaction networks. Our dense graph mining algorithm is the first to guarantee a complete result for density thresholds above 1/3. Additional constraints on feature vectors and connectivity reduce the number of patterns and add additional meaning. The lack of both publicly available FVNs and FVN simulation models motivates us to create a framework for simulating FVNs. We present a framework which is based on the previously introduced Latent Socio-Spatial Process (LSSP) model and on our extension, the nLSSP model. The nLSSP model uses two dampening functions in order to deal with effects related to the network size. Both models can be used to predict links in FVNs, and the model parameters are estimated using Markov Chain Monte Carlo.
Copyright statement
Copyright is held by the author.
Scholarly level
Member of collection
Attachment Size
ETD4841.pdf 5.81 MB