Thesis type
(Project) M.Sc.
Date created
2022-04-29
Authors/Contributors
Author: Chen, Winfield
Abstract
Big genomic resources such as UK Biobank involve hundreds of thousands of subjects and are being established for prospective epidemiological cohort studies with the goal of improving the screening and treatment of disease. Genome-wide association studies (GWAS) on these resources experience time and space efficiency issues which are amplified at the population level. We show two new methods for mitigation of these issues. Firstly, we present a new compressed file format and associated software which exploits properties of the statistical distribution of population genetics files and enables computationally faster and smaller GWAS, which results in reduced costs for GWAS research. We benchmark this new method on Thousand Genomes Project data against the current state-of-the-art and find a significant space efficiency increase. Secondly, software implementing an efficient clustering method for discovered associations from such studies is also presented. The method is applied on GWAS of nearly 4,000 brain imaging phenotypes from UK Biobank, with results associated with pathways involved in various diseases.
Document
Identifier
etd21959
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Elliott, Lloyd
Language
English
Member of collection
Download file | Size |
---|---|
input_data\22526\etd21959.pdf | 1.39 MB |