Computational ortholog prediction: evaluating use cases and improving high-throughput performance

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2013-03-08
Authors/Contributors
Abstract
Orthologs are genes that diverged from an ancestral gene when the species diverged. High-throughput computational methods for ortholog prediction are a key component of many computational biology analyses. A fundamental premise in these analyses is that orthologs (when predicted correctly) are functionally equivalent and can be used to transfer gene annotations across species. Currently, many existing ortholog prediction methods generate a sizeable number of incorrect ortholog predictions, especially in cases of complex gene evolution. My thesis examines the functional equivalence hypothesis further and presents one solution that increases the precision of ortholog prediction. To examine the use of orthologs in computational analysis, I conducted and evaluated three projects that employ ortholog prediction in distinct ways. In these projects, orthologs were used to (1) identify conserved, unique genes in metazoan species, (2) validate predicted gene regulatory modules in Pseudomonas aeruginosa, and (3) construct a transcriptional regulatory network in Aspergillus fumigatus. I identified factors affecting ortholog prediction in these specific use cases, demonstrating how successive gene duplications, incomplete genomes and rapid evolution of gene regulation can impact the results for such analyses. To improve ortholog prediction, I evaluated and augmented an existing method called Ortholuge. Ortholuge is a computational method that increases the precision of ortholog prediction in a high-throughput setting. I evaluated the performance of Ortholuge, showing that its approach of classifying orthologs based on their relative phylogenetic divergence does identify orthologs that are more functionally equivalent. I compared Ortholuge to contemporary methods QuartetS and OMA, and showed that Ortholuge consistently identifies functionally-equivalent orthologs across a range of taxonomic distances. I also further developed Ortholuge’s functionality by reducing run-time, increasing accuracy and improving usability through a number of modifications. Lastly, to make Ortholuge results available to the research community, I developed a database of Ortholuge ortholog predictions for bacteria and archaea species. This online database provides high-level visualization of orthologs and the ability to easily run complex queries to retrieve genes that are shared or unique between specified taxa. Overall, this work contributes an enhanced method for precise high-throughput ortholog identification and increases our understanding of the functional equivalences between orthologs.
Document
Identifier
etd7693
Copyright statement
Copyright is held by the author.
Permissions
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Brinkman, Fiona
Attachment Size
etd7693_MWhiteside.pdf 3.97 MB