In the RNA gene finding area, the covariance model, a probabilistic model based on context-free grammar, provides excellent accuracy. However, high computational complexity has limited its usefulness. This research improves the covariance model's search efficiency by building combined models for a group of different RNA families, which is selected using a hierarchical clustering strategy. Two approaches for building combined models are proposed and implemented. The first approach uses a greedy algorithm to select base pairs from each original family's secondary structure to form a new structure from which a combined covariance model is then built. The second approach constructs a series of combined partial covariance models which are built from the stem loop structural elements and are less complicated than complete models. Experimental results suggest that for most RNA gene families investigated, our combination search method successfully provides run time improvement with acceptable accuracy. Although there still exist limitations such as recall loss for a few RNA gene families, this novel combination approach has implications for future studies of reducing covariance model's search complexity.
Copyright is held by the author.
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Wiese, Kay
Member of collection