Mannan Riad, Mohammad Riadul

Resource type

Thesis

Thesis type

(Project) M.Sc.

Date created

2005

Authors/Contributors

Author: Mannan Riad, Mohammad Riadul

Abstract

Over the years, many methods have been developed for clustering protein sequences based on their similarity. However, most of the methods are based on all-against-all sequence comparison that requires at least quadratic computation on the number of sequences. Furthermore, many methods do not address the issues and challenges associated with protein clustering explicitly such as finding distant relatives and detecting multi-domain proteins. Here, we develop a novel clustering technique based on representatives with successfully avoiding the pair-wise sequence comparison. We address the protein clustering issues in details and give a solution for finding distant relatives and multi-domain proteins. We also develop a new similarity measure that captures the significant similarity information embedded in a sequence such as frequent pattern and sequence length.

Copyright statement

Copyright is held by the author.

Permissions

The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.

Scholarly level

Graduate student (Masters)

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd1718.pdf	969.79 KB

Representative based protein sequence clustering

Views & downloads - as of June 2023