Karakoc, Emre

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2007

Authors/Contributors

Author: Karakoc, Emre

Abstract

This thesis is on applying standard combinatorial optimization methods, dynamic programming and linear programming, to help solve two important problems in computational molecular biology: (1) predicting the secondary structure of RNA molecules and (2) predicting the functionality of small biological compounds. After 25 years of effort, the RNA secondary structure prediction has proven to be very elusive. Much of the available algorithms are based on total free energy minimization. Yet, despite the numerous attempts to perfect this thermodynamic approach, the end results are far from being practical. We demonstrate that delocalizing the thermodynamic cost of forming an RNA substructure through energy density notion can significantly improve available secondary structure prediction methods. Because the notion of energy density is non-linear, the standard dynamic programming approach had to be updated. This updated algorithm can capture the secondary structure of many non-coding RNAs which have been difficult to approximate with alternative methods. One key application of RNA structure prediction is in understanding how two or more RNAs interact (e.g. an mRNA and a regulatory RNA). In this thesis we formulate the RNA-RNA interaction prediction problem as a combinatorial optimization problem and show how to solve it again via dynamic programming. Because the complexity of the algorithm to solve the most involved formulation of the problem is very high, we also describe heuristic shortcuts, which, in practice, are highly accurate. The second set of problems we tackle are related to small chemical molecules, which have key cellular functions. In particular we focus on structural similarity search among small chemical molecules, a standard approach used for in-silico drug discovery. It is possible to use structural similarity to deduce the bioactivities of new compounds provided that the notion of similarity reflects the bioactivity in question and we have efficient data structures to perform structural similarity search. This thesis shows how to computationally design the ``optimal'' weighted Minkowski distance wL_p for maximizing the discrimination between active and inactive compounds with respect to a bioactivity. It also demonstrates how to construct an iterative pruning based data structure for performing ``nearest neighbor'' search under the weighted L_p distance computed.

Keywords

Copyright statement

Copyright is held by the author.

Permissions

The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.

Scholarly level

Graduate student (PhD)

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd2985.pdf	21.92 MB

Computational studies on structure and functionality of biomolecular compounds

Keywords

Views & downloads - as of June 2023