Flow cytometry (FCM) bioinformatics is a sub-field of bioinformatics, aimed at developing effective and efficient computational tools to store, organize, and analyze high-throughput/dimensional FCM data. Flow cytometers are capable of analyzing thousands of cells per second for up to 40 features. These features primarily signal the presence of different proteins on cells in the bloodstream. Hence contributing large amounts of data towards the big biological data paradigm. The data that a flow cytometer outputs from a biological sample, is called a FCS file.The International Mouse Phenotyping Consortium (IMPC) is a collaboration between 23 international institutions and funding organizations. Its aim is to decipher the function of 20,000 mouse genes. IMPC is doing so by breeding mice with a certain gene knocked out (KO), cancelling the function of that gene. In turn, FCM is used to measure the immunological changes correlated to this knockout. Many tools exist to classify FCS files. However, there is a lack of tools to conduct unsupervised clustering of FCS files. One goal of IMPC is to compare and contrast KO genes, hence IMPC becomes a prime motivation for this problem. As such, this thesis outlines a data processing pipeline used to isolate features for each FCS file. We then test the different types of features extracted on a benchmark data set from the FlowCAP-II challenge, containing data from healthy persons and patients with AML (acute myeloid leukemia). We then evaluate how well these features separate out FCS files of different origin (i.e. healthy vs AML).
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Member of collection