Inferring gene-environment interaction from case-parent trio data: evaluation of and adjustment for spurious GxE and development of a data-smoothing method to uncover true GxE

Date created: 
Case-parent trios
Gene-environment interaction
Genotype relative risk
Population stratification
Generalized additive model
Penalized maximum likelihood estimation

Most complex diseases are influenced jointly by genes (G) and environmental or non-genetic attributes (E). Gene-environment interaction (GxE) is measured by statistical interaction between G and E, which occurs when genotype relative risks (GRRs) vary with E. In this thesis, we explore the sources of spurious GxE and propose a data-smoothing approach to GxE for case-parent trio data. In the first project, we address the problem of making inference about GxE based on the transmission rates of alleles from parents to affected offspring. Since GRRs that vary with E lead to transmission rates that do too, transmission rates have been used to make inference about GxE. However transmission-based tests of GxE are found to be invalid in general. To understand the bias of the transmission-based test, we derive theoretical transmission rates and compare their variation with E to that in the GRRs. Through simulation, we investigate the practical implication of the bias. Valid approaches that are not based on transmission rates require specifying or are designed to work well under a parametric form for GxE. In the second project, we develop a data-smoothing method to explore GxE that does not require model specification for the interaction component when we work with genotypes for a causal marker. The data-driven method produces graphical displays of GxE that suggest its form. For testing significance of GxE, we take a permutation approach to account for the additional uncertainty introduced by the smoothing process. For many approaches to inference of GxE with case-parent trio data, including our own, a key assumption is that the test marker is causal; however, in reality, it may not be causal but in linkage disequilibrium with a causal locus. In this case, the approaches can give a false impression of GxE due to a form of population stratification that has not been appreciated well. In the final project, we investigate, through simulation, the source of the spurious GxE and propose an adjustment that uses additional unlinked markers genotyped in the affected offspring.

Document type: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
Jinko Graham
Brad McNeney
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.