Bacteria are rich with functional diversity enhanced by their ability to readily adopt and transmit novel traits via horizontal gene transfer. Genomic islands (GIs), defined as clusters of genes of probable horizontal origin integrated into the chromosome, are of particular interest because they disproportionately encode novel features, including antimicrobial resistance genes and virulence factors. As genomic approaches to investigating pathogen outbreaks and characterizing the dissemination of genes across microbial populations advance, there is a need for specialized tools to characterize GIs in population datasets. However, previous tools only allowed for GI prediction in single genomes. Additionally, it was unclear how well GIs are predicted in incomplete genomes. The aim of my thesis was to facilitate reliable GI comparison in multi-genome datasets and investigate GIs involved in disseminating genes of interest within datasets of clinical relevance. IslandCompare is a newly-released, web-based platform designed to handle hundreds of genomes. My work focused on incorporating novel functionality for clustering GIs and ensuring consistent cross-genome predictions. This, coupled with the facilitated visualization platform, allows users to rapidly identify differences in GI content across genomes. I also led the establishment of a centralized database of curated GIs, which already includes entries from key pathogenic species. Targeted predictions of these curated GI sequences are being incorporated into IslandCompare alongside functional information. I developed an approach for predicting curated Salmonella enterica GIs. My evaluation of draft genomes revealed that GI predictions are often missed by current sequence composition-based methods, especially when interrupted by contig breaks. Predictions are also less sensitive in metagenome-assembled genomes. An investigation of Enterococcus faecium genomes from disparate habitats and continents revealed that habitat plays a greater role than geography in phylogenetic differentiation and associated accessory gene contents. Analysis with IslandCompare led to the identification of GI clusters involved in the dissemination of antimicrobial resistance genes and virulence factors, including vancomycin resistance and surface adhesion genes. My collective work will facilitate a better understanding of computational GI prediction and comparison of GI contents across population datasets, including for cases where GIs play a role in the dissemination of genes of clinical and environmental importance.
Copyright is held by the author(s).