Upgrading Bayesian Network Scores for Multi-Relational Data

Resource type
Thesis type
(Thesis) M.Sc.
Date created
A model selection score measures how well a model fits a dataset. We describe a new method for extending, or upgrading, a Bayesian network (BN) score designed for single-table i.i.d. data to multi-relational datasets (databases). A multi-relational BN model provides an integrated statistical analysis of the interdependent data tables in a database. We focus on log-linear model selection scores. Our key theoretical desideratum is preserving consistency: if a model selection score is statistically consistent for single-table data, then the upgraded score should be statistically consistent for relational data as well. It is difficult if not impossible to define an upgrade method for log-linear model scores that preserves consistency, if the upgraded score is a function of a single model only. We therefore develop a novel approach where an upgraded model score is a {\em gain function} that compares two models: a current vs. an alternative BN structure. Our main theorem establishes that model search based on our upgraded gain function preserves consistency. Empirical evaluation on six benchmark relational databases shows that our upgraded scores select an informative BN structure that strikes a balance between overly sparse and overly dense graph structures.
Copyright statement
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Schulte, Oliver
Member of collection
Attachment Size
etd9823_SGholami.pdf 917.11 KB