Analysis of Data in Network and Natural Language Formats

Resource type
Thesis type
(Thesis) Ph.D.
Date created
The work herein describes a predictive model for cricket matches, a method of evaluating cricket players, and a method to infer properties of a network from a link-traced sample. In Chapter 2, player characteristics are estimated using a frequency count of the outcomes that occur when that player is batting or bowling. These characteristics are weighted against the relative propensity of each outcome in each of 200 game situations (10 wickets times 20 overs), and incorporate prior information using a Metropolis-Hastings algorithm. The characteristics of players in selected team rosters are then fed into a system we developed to simulate outcomes of whole games. The winning probabilities of each team are shown to perform similarly to competitive betting lines during the 2014 Cricket World Cup. In Chapter 3 the simulator is used to estimate the effect, in terms of expected number of runs, of each player. The effect of the player is reported as expected runs scored or allowed per innings above an average player in the same batting or bowling position. Chapter 4 proposes a method based on approximate Bayesian computation (ABC) to make inferences on hidden parameters of a network graph. Network inference using ABC is a very new field. This is the first work, to the author’s knowledge, of an ABC based inference using only a sample of a network, rather than the either network. Summary statistics are taken from the sample of the network of interest, networks and samples are then simulated using hidden parameters from a prior distribution, and a posterior of the parameters is found by a kernel density estimate conditioned on the summary statistics. Chapter 5 describes an application of the method proposed in Chapter 4 to real data. A network of precedence citations between legal documents, centered around cases overseen by the Supreme Court of Canada, is observed. The features of certain cases that lead to their frequent citation are inferred, and their effects estimated by ABC. Future work and extensions are briefly discussed in Chapter 6.
Copyright statement
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Thompson, Steven
Attachment Size
etd9742_MDavis.pdf 1.33 MB