Resource type
Thesis type
(Thesis) M.Sc.
Date created
2016-09-20
Authors/Contributors
Author: Gholami, Sajjad
Abstract
A model selection score measures how well a model fits a dataset. We describe a new method for extending, or upgrading, a Bayesian network (BN) score designed for single-table i.i.d. data to multi-relational datasets (databases). A multi-relational BN model provides an integrated statistical analysis of the interdependent data tables in a database. We focus on log-linear model selection scores. Our key theoretical desideratum is preserving consistency: if a model selection score is statistically consistent for single-table data, then the upgraded score should be statistically consistent for relational data as well. It is difficult if not impossible to define an upgrade method for log-linear model scores that preserves consistency, if the upgraded score is a function of a single model only. We therefore develop a novel approach where an upgraded model score is a {\em gain function} that compares two models: a current vs. an alternative BN structure. Our main theorem establishes that model search based on our upgraded gain function preserves consistency. Empirical evaluation on six benchmark relational databases shows that our upgraded scores select an informative BN structure that strikes a balance between overly sparse and overly dense graph structures.
Document
Identifier
etd9823
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Schulte, Oliver
Member of collection
Download file | Size |
---|---|
etd9823_SGholami.pdf | 917.11 KB |