Multi-Relational Learning with SQL All the Way

Date created: 
Statistical Relational Learning(SRL)
Multi-Relational Database
Sufficient Statistics
Log-Linear Model
Bayesian networks (BNs)
Dependency Networks (DNs)
Link Analysis
Generative Modelling
Discriminative Learning.

Which doctors prescribe which drugs to which patients? Who upvotes which answers on what topics on Quora? Who has followed whom on Twitter/Weibo? These relationships are all visible in data, and they all contain a wealth of information that could be extracted to be knowledge/wisdom. Statistical Relational Learning (SRL) is a recent growing field which extends traditional machine learning from single-table to multiple inter-related tables. It aims to provide integrated statistical analysis of heterogeneous and interdependent complex data. In the thesis, I focus on modelling the interactions between different attributes and the link itself for such complex heterogeneous and richly interconnected data. First, I describe the FactorBase system which combines advanced analytics from statistical-relational machine learning (SRL) with database systems. Within FactorBase, all statistical objects are stored as first-class citizens as well as raw data. This new SQL-based framework pushes the multi-relational model discovery into a relational database management system. Secondly, to solve the scalability issue of computing cross-table sufficient statistics, a new Virtual Join algorithm is proposed and implemented in FactorBase. Bayesian networks (BNs) and Dependency Networks (DNs) are two major classes of SRL. Thirdly, I utilize FactorBase to extend the state-of-the-art learning algorithm for BN of generative modelling with link uncertainty. The learned model captures correlations between link types, link features, and attributes of nodes, simultaneously. Finally, a fast hybrid approach is proposed for instance level discriminative learning of DNs with competitive predictive power but substantially better scalability.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Senior supervisor: 
Oliver Schulte
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.