Proteins sharing a certain biological role often contain short sequences, or motifs, that are conserved at much greater levels than surrounding areas. The presence of these motifs related to function can be useful in assigning hypothesized functions to proteins in newly sequenced genomes, and, combined with experimental data, help to discern the mechanism of a protein's function. Due to mutations, these motifs will be expected to vary both in length and in amino acid content. Several approaches, including Expectation-Maximization and Gibbs Sampling, have been developed to computationally detect overrepresented motifs in a set of protein sequences. These approaches have generally focused on the problem of detection of motifs of equal length, and do not work well with certain classes of motifs that do not retain equal length. We provide a novel approach to detection of gapped motifs, which outperforms several traditional motif discovery approaches with several biologically motivated datasets.
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact email@example.com.
Member of collection