Computing Science, School of

Receive updates for this collection

deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2011
Abstract: 

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

Document type: 
Article
File(s): 

Module Discovery by Exhaustive Search for Densely Connected, Co-Expressed Regions in Biomolecular Interaction Networks

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2010
Abstract: 

Background

Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.

Methodology/Principal Findings

We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.

Conclusion/Significance

We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.

Document type: 
Article
File(s): 

Linearization of Ancestral Multichromosomal Genomes

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2012
Abstract: 

BACKGROUND:Recovering the structure of ancestral genomes can be formalized in terms of properties of binary matrices such as the Consecutive-Ones Property (C1P). The Linearization Problem asks to extract, from a given binary matrix, a maximum weight subset of rows that satisfies such a property. This problem is in general intractable, and in particular if the ancestral genome is expected to contain only linear chromosomes or a unique circular chromosome. In the present work, we consider a relaxation of this problem, which allows ancestral genomes that can contain several chromosomes, each either linear or circular.RESULT:We show that, when restricted to binary matrices of degree two, which correspond to adjacencies, the genomic characters used in most ancestral genome reconstruction methods, this relaxed version of the Linearization Problem is polynomially solvable using a reduction to a matching problem. This result holds in the more general case where columns have bounded multiplicity, which models possibly duplicated ancestral genes. We also prove that for matrices with rows of degrees 2 and 3, without multiplicity and without weights on the rows, the problem is NP-complete, thus tracing sharp tractability boundaries.CONCLUSION:As it happened for the breakpoint median problem, also used in ancestral genome reconstruction, relaxing the definition of a genome turns an intractable problem into a tractable one. The relaxation is adapted to some biological contexts, such as bacterial genomes with several replicons, possibly partially assembled. Algorithms can also be used as heuristics for hard variants. More generally, this work opens a way to better understand linearization results for ancestral genome structure inference.

Document type: 
Article

smyRNA: A Novel Ab Initio ncRNA Gene Finder

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2009-05-05
Abstract: 

Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs.Methodology/Principal FindingsWe present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences.Conclusions/SignificanceOur method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability: http://compbio.cs.sfu.ca/taverna/smyrna

Document type: 
Article
File(s): 

The Generation Challenge Programme Platform: Semantic Standards and Workbench for Crop Science

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2008
Abstract: 

The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding.  A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive,  high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.

Document type: 
Article

Conditional Random Fields and Supervised Learning in Automated Skin Lesion Diagnosis

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2011
Abstract: 

Many subproblems in automated skin lesion diagnosis (ASLD) canbe unified under a single generalization of assigning a label, from an predefinedset, to each pixel in an image. We first formalize this generalizationand then present two probabilistic models capable of solving it. The firstmodel is based on independent pixel labeling using maximum a-posteriori(MAP) estimation. The second model is based on conditional randomfields (CRFs), where dependencies between pixels are defined using agraph structure. Furthermore, we demonstrate how supervised learningand an appropriate training set can be used to automatically determineall model parameters. We evaluate both models' ability to segment achallenging dataset consisting of 116 images and compare our results to5 previously published methods.

Document type: 
Article

Improvement and Performance Evaluation for Multimedia Files Transmission in Vehicle-Based DTNs

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2013
Abstract: 

In recent years, P2P file sharing has been widely embraced and becomes the largest application of the Internet traffic. And thedevelopment of automobile industry has promoted a trend of deploying Peer-to-Peer (P2P) networks over vehicle ad hoc networks(VANETs) for mobile content distribution. Due to the high mobility of nodes, nodes’ limited radio transmission range and sparsedistribution, VANETs are divided and links are interrupted intermittently. At this moment, VANETs may become Vehicle-basedDelay Tolerant Network (VDTNs). Therefore, this work proposes an Optimal Fragmentation-based Multimedia Transmissionscheme (OFMT) based on P2P lookup protocol in VDTNs, which can enable multimedia files to be sent to the receiver fast andreliably in wireless mobile P2P networks over VDTNs. In addition, a method of calculating the most suitable size of the fragmentis provided, which is tested and verified in the simulation. And we also show that OFMT can defend a certain degree of DoS attackand senders can freely join and leave the wireless mobile P2P network. Simulation results demonstrate that the proposed schemecan significantly improve the performance of the file delivery rate and shorten the file delivery delay compared with the existingschemes.

Document type: 
Article

Organization and Evolution of Primate Centromeric DNA from Whole-Genome Shotgun Sequence Data

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2007-09
Abstract: 

The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

Document type: 
Article
File(s): 

Not All Scale-Free Networks Are Born Equal: The Role of the Seed Graph in PPI Network Evolution

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2007-07
Abstract: 

The (asymptotic) degree distributions of the best-known “scale-free” network models are all similar and are independent of the seed graph used; hence, it has been tempting to assume that networks generated by these models are generally similar. In this paper, we observe that several key topological features of such networks depend heavily on the specific model and the seed graph used. Furthermore, we show that starting with the “right” seed graph (typically a dense subgraph of the protein–protein interaction network analyzed), the duplication model captures many topological features of publicly available protein–protein interaction networks very well

Document type: 
Article
File(s):