Burkett, Kelly

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2011-11-25

Authors/Contributors

Author: Burkett, Kelly

Abstract

The gene genealogy is a tree describing the ancestral relationships among genes sampled from unrelated individuals. Knowledge of the tree is useful for inference of population-genetic parameters such as the mutation or recombination rate. It also has potential application in genomic mapping, as individuals with similar trait values will tend to be more closely related genetically at the location of a trait-influencing mutation. One way to incorporate genealogical trees in genetic applications is to sample them conditional on genetic data observed at present. In this thesis, we describe our Markov chain Monte Carlo (MCMC) based genealogy sampler. First, we describe the sampler that conditions on haplotype data. Our implementation is based on the sampler described in Zollner and Pritchard (2005). However, we have made several changes to increase the efficiency of sampling. We illustrate the use of our sampler on haplotype data from a publicly-available dataset, where we examine statistics summarizing the degree to which case haplotypes are more related to each other than to control haplotypes. Most genealogy samplers condition on the haplotype data of present day sequences being available. However, commonly used genotyping technology measures genotypes at single loci rather than haplotypes and therefore the haplotype data needs to be imputed. To avoid single imputation, we then describe how the original sampler was extended to handle the case of only genotype data being available. We apply the sampler to simulated data to evaluate how well it estimates genetic parameters and predicts haplotypes. Adequate mixing of the sampler was a concern for some of the test datasets. The mixing difficulties were attributed to substantial dependence between the tree structure and the latent variables introduced to facilitate sampling of the trees. We describe our experiences with using simulated tempering in order to improve the mixing of the sampler. Our heated distributions were chosen so that the dependencies between the latent variables and the tree structure were gradually reduced. We apply this approach to a simulated dataset to illustrate how simulated tempering can improve mixing over the haplotype configurations.

Keywords

Identifier

etd6909

Copyright statement

Copyright is held by the author.

Permissions

The author granted permission for the file to be printed and for the text to be copied and pasted.

Scholarly level

Graduate student (PhD)

Supervisor or Senior Supervisor

Thesis advisor: Graham, Jinko

Thesis advisor: McNeney, Brad

Member of collection

Statistics and Actuarial Science Theses

Download file	Size
etd6909_KBurkett.pdf	1.64 MB

Markov chain Monte Carlo sampling of gene genealogies conditional on observed genetic data

Keywords

Views & downloads - as of June 2023