Scalable statistical-relational model discovery

Thesis type
(Thesis) M.Sc.
Date created
2021-08-04
Authors/Contributors
Author: Mar, Richard
Abstract
Many organisations store large amounts of data in relational databases and require efficient ways to extract useful information from them. Machine learning models learned from these databases enable intelligent queries to be answered. Typically these models require sufficient statistics in the form of frequency counts, which are efficiently captured by a contingency table (ct-table). Several techniques have been developed to generate ct-tables from a single table; however, in the case of multi-relational databases, unique challenges arise making these solutions inappropriate to use. In particular, the data is spread across multiple tables and must be joined to determine the correct frequency counts. In addition, counts for the non-existing relationships must be inferred as they are not explicitly stored in the database. This thesis presents a novel hybrid-counting (HYBRID) approach to computing ct-tables from relational databases that combines pre-counting (PRECOUNT) and post-counting (ONDEMAND) methods to provide a technique that is able to address the weaknesses in both methods.
Document
Identifier
etd21593
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Schulte, Oliver
Language
English
Member of collection
Attachment Size
input_data\22168\etd21593.pdf 986.89 KB