Liu, Sichen

Resource type

Graduating extended essay / Research project

Date created

2020-08-19

Authors/Contributors

Author: Liu, Sichen

Abstract

Big data curation is often underappreciated by users of processed data. With the development of high-throughput genotyping technology, large-scale genome-wide data are available for genetic association analysis with disease. In this project, we describe a data-curation protocol to deal with the genotyping errors and missing values in genetic data. We obtain publicly-available genetic data from three studies in the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and with the aid of the freely-available HapMap3 reference panel, we improve the quality and size of the ADNI genetic data. We use the software PLINK to manage data format, SHAPEIT to check DNA strand alignment and perform phasing of the genetic markers that have been inherited from the same parent, IMPUTE2 to impute missing SNP genotypes, and GTOOL to merge files and convert file formats. After merging the genetic data across these studies, we also use the reference panel to investigate the population structure of the processed data. ADNI's participants are collected in the U.S, where the majority of the population are descendants of relatively recent immigrants. We use principal component analysis to understand the population structure of the participants, and model-based clustering to investigate the genetic composition of each participant and compare it with self-reported ethnicity information. This project is intended to serve as a guide to future users of the processed data.

Keywords

Identifier

etd20976

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (Masters)

Member of collection

Statistics and Actuarial Science Theses

Download file	Size
etd20976.pdf	2.12 MB

Curating and combining big data from genetic studies

Keywords

Views & downloads - as of June 2023