Pan-Cancer Identification and Prioritization of Cancer-Associated Alternatively Spliced and Differentially Expressed Genes: A Biomarker Discovery Application

Date created: 
Dr. Steven J.M. Jones
Science: Department of Molecular Biology and Biochemistry
Alternative splicing
Gene expression
Biomarker target
Analytic Hierarchy Process

Tumour cells arise through aberrant expression of genes and the proteins they encode. This may result from a direct change to DNA sequence or perturbations in the machinery responsible for production or activity of proteins, such as gene splicing. With the advent of massively parallel RNA-sequencing (RNA-seq), large-scale exploration of changes at the stage of transcription and posttranscriptional splicing has the potential to unravel the landscape of gene expression changes across human cancers. Aberrantly expressed genes in cancer can serve as molecular biomarkers for discrimination of tumour and normal cells if localized to the cell surface and therefore can be used as targets for targeted antibody-based cancer therapy. In the current study, I devised an analysis pipeline to identify and rank such events from human cancer RNA-seq datasets. Using my pipeline, I conducted a pan-cancer analysis in the RNA-sequencing data of more than 7,000 patients from 24 different cancer types generated by the cancer genome atlas (TCGA). I identified abnormally expressed and alternatively spliced genes, which seemed to be cancer-associated in comparison to a large compendium of transcriptomes from non-diseased tissues gathered from Genotype-Tissue Expression (GTEx) and TCGA. My analysis revealed 1,503 putative tumor-associated abnormally expressed genes and 1,142 novel cancer-associated splice variants occurring in 694 genes. In order to rank identified candidate genes, I performed an extensive literature search and studied known therapeutic antibody targets to collect the characteristics of an ideal antibody target in cancer. I developed an R package, Prize, based on the Analytic Hierarchy Process (AHP) algorithm. AHP is a multiple-criteria decision making solution that allows a user to prioritize a list of elements based of a set of user-define criteria and numerical score that express the importance of each criterion to achieving the goal. I built an AHP model to depict cancer biomarker target properties for ranking and prioritizing the genes. Using this model, Prize was able to successfully recognize and rank known tumour biomarker targets among the top 25 ranked list along with other novel candidates.

Thesis type: 
(Thesis) Ph.D.
Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.