Prokaryotic protein subcellular localization prediction and genome-scale comparative analysis

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2011-12-14
Authors/Contributors
Abstract
Identifying protein subcellular localization (SCL) is important for deducing protein function, annotating newly sequenced genomes, and guiding experimental designs. Identification of cell surface-bound and secreted proteins from pathogenic bacteria may lead to the discovery of biomarkers, novel vaccine components and therapeutic targets. Characterizing such proteins for non-pathogenic bacteria and archaea can have industrial uses, or play a role in environmental detection. Previously, the Brinkman lab has developed PSORTb, the most precise SCL prediction software tool for bacteria. However, as we increasingly appreciate the diversity of prokaryotic species and their cellular structures, it became clear that there was a need to more accurately make predictions for more diverse microbes. For my thesis research, I developed a new version of PSORTb that now provides SCL prediction capability for more prokaryotes, including Archaea and Bacteria with atypical cell wall and membrane structures. The new PSORTb also has significantly increased proteome prediction coverage for all bacterial species. The software is the first of its kind to predict subcategory localizations for bacterial organelles such as the flagellum as well as host cell destinations. Using both computational validations and a new proteomic dataset I produced, I established that PSORTb 3.0 outperforms all other published prokaryotic SCL prediction tools in terms of both precision and recall. Furthermore, I have developed a semi-automated version of a comprehensive prokaryotic SCL database (PSORTdb) that provides access to experimentally verified and pre-computed SCL predictions for all sequenced prokaryotic genomes. I developed an ‘outer membrane prediction method’ which allows auto-detection of bacterial structure, distinguishing bacteria with one vs. two membranes. This method allows the database to be automatically updated as newly sequenced genomes are released. In addition, the method can aid more general analysis of a bacterial genome for which the bacteria’s associated cellular structure is not initially clear. Finally, I performed a global analysis of SCL proportions for over 1000 sequenced bacterial and archaeal genomes. This is the most comprehensive SCL analysis of prokaryotes to date. My findings provide insights into prokaryotic protein network evolution, elucidate relationships between SCL proportions and genome size, and provide directions for future SCL prediction research.
Document
Identifier
etd6956
Copyright statement
Copyright is held by the author.
Permissions
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Brinkman, Fiona
Attachment Size
etd6956_NYu.pdf 2.49 MB