Resource type
Thesis type
(Thesis) Ph.D.
Date created
2023-10-23
Authors/Contributors
Author: Epasinghege Dona, Nirodha
Abstract
This thesis consists of five distinct chapters; each considers various applications within the domain of big data. In the opening chapter, an application of genetics is presented. It discusses how to simulate exome-sequencing data for 150 families from a North American admixed population, containing at least four members affected with lymphoid cancer. These data encompass details regarding the ascertained families, along with information about single-nucleotide variants found in the exome of the affected family members. The subsequent chapters focus on sports analytics through the lens of big data applications. In the second chapter, the expected goals concept is extended to limited overs cricket where ideas are illustrated using the economy rate statistic. The approach is based on the estimation of batting outcome probabilities given detailed data on each ball that is bowled in a match. Through the utilization of machine learning techniques, estimation of batting outcomes is carried out. From the analysis, distinctions between men's and women's T20 cricket are observed. One such finding is that there is a higher frequency of sixes occurring in the men's game than in the women's game. In the third chapter, the focus shifts to examining the issue of pace of play in soccer. In this study, the key question revolves around whether employing a fast-paced playing style offers an advantageous strategy in the game. This is a question that remains insufficiently addressed in both soccer and hockey. The investigation is enabled through the utilization of tracking data which provides the locations of players measured at frequent intervals (i.e. 10 times per second). The chapter begins by formulating a definition of pace. In this study, we use methods of causal inference to investigate the relationship between pace in soccer and shots. The analysis reveals that maintaining a higher pace than the opponent throughout a match results in an advantage of approximately two additional shots per game. The fourth chapter entails an assessment of the optimal locations for throw-ins in soccer. The investigation is also enabled through the utilization of tracking data which provides the locations of players measured at frequent intervals (i.e. 10 times per second). The methods for the investigation are necessarily causal since there are confounding variables that impact both the throw- in location and the result of the throw-in. A simple causal analysis indicates that on average, backwards throw-ins are beneficial and lead to an extra 2.5 shots per 100 throw-ins. We also observe that there is a benefit to long throw-ins where on average, they result in roughly 4.0 more shots per 100 throw-ins. These results are confirmed by a more complex causal analysis that relies on the spatial structure of throw-ins. The last chapter proposes increasingly complex models based on publicly available data involving rally length in tennis. The models provide insights regarding player characteristics involving the ability to extend rallies and relates these characteristics to performance measures. The analysis highlights some important features that make a difference between winning and losing, and therefore provides feedback on how players may improve. Bayesian models are introduced where posterior estimation is carried out using Markov chain Monte Carlo methods.
Document
Extent
156 pages.
Identifier
etd22923
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Swartz, Tim
Thesis advisor: Graham, Jinko
Language
English
Member of collection
Download file | Size |
---|---|
etd22923.pdf | 1.6 MB |