Scalable statistical-relational model discovery

Thesis type
(Thesis) M.Sc.
Date created
Author: Mar, Richard
Many organisations store large amounts of data in relational databases and require efficient ways to extract useful information from them. Machine learning models learned from these databases enable intelligent queries to be answered. Typically these models require sufficient statistics in the form of frequency counts, which are efficiently captured by a contingency table (ct-table). Several techniques have been developed to generate ct-tables from a single table; however, in the case of multi-relational databases, unique challenges arise making these solutions inappropriate to use. In particular, the data is spread across multiple tables and must be joined to determine the correct frequency counts. In addition, counts for the non-existing relationships must be inferred as they are not explicitly stored in the database. This thesis presents a novel hybrid-counting (HYBRID) approach to computing ct-tables from relational databases that combines pre-counting (PRECOUNT) and post-counting (ONDEMAND) methods to provide a technique that is able to address the weaknesses in both methods.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Schulte, Oliver
Member of collection
Attachment Size
input_data\22168\etd21593.pdf 986.89 KB