Skip to main content

Quantifying trends in bacterial virulence and pathogen-associated genes through large scale bioinformatics analysis

Resource type
Thesis type
(Thesis) Ph.D.
Date created
With an increasing number of bacterial genomes becoming available, we are now able to investigate and quantify selected general trends in pathogenicity shared across diverse pathogens which have been previously anecdotally reported but have not yet been quantified on a larger scale. In addition, we can perform more high-throughput approaches for the identification of virulence-associated genes that represent possible therapeutic or prophylactic targets. In this study, I systematically examined up to 267 pathogen and non-pathogen genomes from diverse genera, and identified trends associated with a curated data set of known bacterial virulence factors (VFs). I show, in support of previous anecdotal statements, that genomic islands (clusters of genes of probable horizontal origin) disproportionately do contain more VFs than the rest of a given genome (p < 2.20E-16), supporting their important role in pathogen evolution. To gain insights into the types of genes that may play a more virulence-specific role in pathogens, I also performed an analysis to identify pathogen-associated genes (genes found predominately in pathogens across multiple genera, but not found in non-pathogens). I found that disproportionately high numbers of pathogen-associated VFs are “offensive” (involved in active invasion of the host), such as certain types of toxins, as well as Type III and Type IV secretion systems. Some of the pathogen-specific genes identified have apparently not yet been examined for their potential as vaccine components or drug targets and merit further study. As the first step in the initiation of more sophisticated analyses of trends in virulence, I also developed a Virulence Gene Experiment Database (VGEDB) that incorporates contextual information about virulence. This database is unique in that entries are centered around describing a particular virulence gene experiment, rather than a virulence gene. I used this database in part to investigate a common BLAST-based approach for computationally identifying VFs in genomic sequences. My analysis suggests that this common VF-prediction method is very inaccurate. This work in general provides the first large-scale, multi-genera, quantitative data describing selected trends in bacterial virulence and provides global insights regarding pathogen evolution and pathogen-associated traits of primary importance in a pathogenic lifestyle.
Copyright statement
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact
Scholarly level
Download file Size
etd3314.pdf 5.47 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 1