Molecular Biology and Biochemistry, Department of

Receive updates for this collection

Evaluation of Genomic Island Predictors Using a Comparative Genomics Approach

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2008
Abstract: 

Background: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probablehorizontal origin. GIs are disproportionately associated with microbial adaptations of medical orenvironmental interest. Recently, multiple programs for automated detection of GIs have beendeveloped that utilize sequence composition characteristics, such as G+C ratio and dinucleotidebias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs beconstructed using criteria that are independent of sequence composition-based analysisapproaches.Results: We developed a comparative genomics approach (IslandPick) that identifies both veryprobable islands and non-island regions. The approach involves 1) flexible, automated selection ofcomparative genomes for each query genome, using a distance function that picks appropriategenomes for identification of GIs, 2) identification of regions unique to the query genome,compared with the chosen genomes (positive dataset) and 3) identification of regions conservedacross all genomes (negative dataset). Using our constructed datasets, we investigated the accuracyof several sequence composition-based GI prediction tools.Conclusion: Our results indicate that AlienHunter has the highest recall, but the lowest measuredprecision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB havecomparable overall highest accuracy. Our comparative genomics approach, IslandPick, was themost accurate, compared with a curated list of GIs, indicating that we have constructed suitabledatasets. This represents the first evaluation, using diverse and, independent datasets that were notartificially constructed, of the accuracy of several sequence composition-based GI predictors. Thecaveats associated with this analysis and proposals for optimal island prediction are discussed.

Document type: 
Article

A Salmonid EST Genomic Study: Genes, Duplications, Phylogeny and Microarrays

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2008
Abstract: 

Background: Salmonids are of interest because of their relatively recent genome duplication, and their extensive usein wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different speciesprovide valuable genomic information for one of the most widely studied groups of fish.Results: 298,304 expressed sequence tags (ESTs) from Atlantic salmon (69% of the total), 11,664 chinook, 10,813sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in thisstudy and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmonclones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, genepredictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbowtrout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4,92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchusand Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae.Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of theavailable EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-specieshybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species.Conclusion: An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon,Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is consistent withan ancestral salmonid genome duplication hypothesis. Genome resources, including a new 32 K microarray, providevaluable new tools to study salmonids.

Document type: 
Article

Oligonucleotide Array Comparative Genomic Hybridization (oaCGH) Based Characterization of Genetic Deficiencies as an Aid to Gene Mapping in Caenorhabditis Elegans

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2007
Abstract: 

Background: A collection of genetic deficiencies covering over 70% of the Caenorhabditis elegansgenome exists, however the application of these valuable biological tools has been limited due tothe incomplete correlation between their genetic and physical characterization.Results: We have applied oligonucleotide array Comparative Genomic Hybridization (oaCGH) tothe high resolution, molecular characterization of several genetic deficiency and duplication strainsin a 5 Mb region of Chromosome III. We incorporate this data into a physical deficiency map whichis subsequently used to direct the positional cloning of essential genes within the region. From thisanalysis we are able to quickly determine the molecular identity of several previously unidentifiedmutations.Conclusion: We have applied accurate, high resolution molecular analysis to the characterizationof genetic mapping tools in Caenorhabditis elegans. Consequently we have generated a valuablephysical mapping resource, which we have demonstrated can aid in the rapid molecularidentification of mutations of interest.

Document type: 
Article

GExplore: a Web Server for Integrated Queries of Protein Domains, Gene Expression and Mutant Phenotypes

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2009
Abstract: 

Background: The majority of the genes even in well-studied multi-cellular model organisms havenot been functionally characterized yet. Mining the numerous genome wide data sets related toprotein function to retrieve potential candidate genes for a particular biological process remains achallenge.Description: GExplore has been developed to provide a user-friendly database interface for datamining at the gene expression/protein function level to help in hypothesis development andexperiment design. It supports combinatorial searches for proteins with certain domains, tissue- ordevelopmental stage-specific expression patterns, and mutant phenotypes. GExplore operates ona stand-alone database and has fast response times, which is essential for exploratory searches. Theinterface is not only user-friendly, but also modular so that it accommodates additional data sets inthe future.Conclusion: GExplore is an online database for quick mining of data related to gene and proteinfunction, providing a multi-gene display of data sets related to the domain composition of proteinsas well as expression and phenotype data. GExplore is publicly available at: http://genome.sfu.ca/gexplore/

Document type: 
Article

Expression and Genomic Organization of Zonadhesin-Like Genes in Three Species of Fish Give Insight into the Evolutionary History of a Mosaic Protein

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2005
Abstract: 

Background: The mosaic sperm protein zonadhesin (ZAN) has been characterized in mammalsand is implicated in species-specific egg-sperm binding interactions. The genomic structure andtestes-specific expression of zonadhesin is known for many mammalian species. All zonadhesingenes characterized to date consist of meprin A5 antigen receptor tyrosine phosphatase mu(MAM) domains, mucin tandem repeats, and von Willebrand (VWD) adhesion domains. Here weinvestigate the genomic structure and expression of zonadhesin-like genes in three species of fish.Results: The cDNA and corresponding genomic locus of a zonadhesin-like gene (zlg) in Atlanticsalmon (Salmo salar) were sequenced. Zlg is similar in adhesion domain content to mammalianzonadhesin; however, the domain order is altered. Analysis of puffer fish (Takifugu rubripes) andzebrafish (Danio rerio) sequence data identified zonadhesin (zan) genes that share the same domainorder, content, and a conserved syntenic relationship with mammalian zonadhesin. A zonadhesinlikegene in D. rerio was also identified. Unlike mammalian zonadhesin, D. rerio zan and S. salar zlgwere expressed in the gut and not in the testes.Conclusion: We characterized likely orthologs of zonadhesin in both T. rubripes and D. rerio anduncovered zonadhesin-like genes in S. salar and D. rerio. Each of these genes contains MAM, mucin,and VWD domains. While these domains are associated with several proteins that show prominentgut expression, their combination is unique to zonadhesin and zonadhesin-like genes in vertebrates.The expression patterns of fish zonadhesin and zonadhesin-like genes suggest that the reproductiverole of zonadhesin evolved later in the mammalian lineage.

Document type: 
Article

Efficient Assembly of Very Short Oligonucleotides Using T4 DNA Ligase

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2010
Abstract: 

Background: In principle, a pre-constructed library of all possible short oligonucleotides could be used toconstruct many distinct gene sequences. In order to assess the feasibility of such an approach, we characterized T4DNA Ligase activity on short oligonucleotide substrates and defined conditions suitable for assembly of a pluralityof oligonucleotides.Findings: Ligation by T4 DNA Ligase was found to be dependent on the formation of a double stranded DNAduplex of at least five base pairs surrounding the site of ligation. However, ligations could be performed effectivelywith overhangs smaller than five base pairs and oligonucleotides as small as octamers, in the presence of asecond, complementary oligonucleotide. We demonstrate the feasibility of simultaneous oligonucleotidephosphorylation and ligation and, as a proof of principle for DNA synthesis through the assembly of shortoligonucleotides, we performed a hierarchical ligation procedure whereby octamers were combined to construct atarget 128-bp segment of the beta-actin gene.Conclusions: Oligonucleotides as short as 8 nucleotides can be efficiently assembled using T4 DNA Ligase. Thus,the construction of synthetic genes, without the need for custom oligonucleotide synthesis, appears feasible.

Document type: 
Article

Functional Characterization in Caenorhabditis Elegans of Transmembrane Worm-Human Orthologs

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2004
Abstract: 

Background: The complete genome sequences for human and the nematode Caenorhabditiselegans offer an opportunity to learn more about human gene function through functionalcharacterization of orthologs in the worm. Based on a previous genome-wide analysis of wormhumanorthologous transmembrane proteins, we selected seventeen genes to exploreexperimentally in C. elegans. These genes were selected on the basis that they all have highconfidence candidate human orthologs and that their function is unknown. We first analyzed theirphylogeny, membrane topology and domain organization. Then gene functions were studiedexperimentally in the worm by using RNA interference and transcriptional gfp reporter genefusions.Results: The experiments gave functional insights for twelve of the genes studied. For example,C36B1.12, the worm ortholog of three presenilin-like genes, was almost exclusively expressed inhead neurons, suggesting an ancient conserved role important to neuronal function. We proposea new transmembrane topology for the presenilin-like protein family. sft-4, the worm ortholog ofsurfeit locus gene Surf-4, proved to be an essential gene required for development during the larvalstages of the worm. R155.1, whose human ortholog is entirely uncharacterized, was implicated inbody size control and other developmental processes.Conclusions: By combining bioinformatics and C. elegans experiments on orthologs, we providefunctional insights on twelve previously uncharacterized human genes.

Document type: 
Article

Comprehensive Analysis of Gene Expression Patterns of Hedgehog-Related Genes

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

Background: The Caenorhabditis elegans genome encodes ten proteins that share sequence similarity withthe Hedgehog signaling molecule through their C-terminal autoprocessing Hint/Hog domain. Theseproteins contain novel N-terminal domains, and C. elegans encodes dozens of additional proteinscontaining only these N-terminal domains. These gene families are called warthog, groundhog, ground-likeand quahog, collectively called hedgehog (hh)-related genes. Previously, the expression pattern of seventeengenes was examined, which showed that they are primarily expressed in the ectoderm.Results: With the completion of the C. elegans genome sequence in November 2002, we reexamined andidentified 61 hh-related ORFs. Further, we identified 49 hh-related ORFs in C. briggsae. ORF analysisrevealed that 30% of the genes still had errors in their predictions and we improved these predictions here.We performed a comprehensive expression analysis using GFP fusions of the putative intergenicregulatory sequence with one or two transgenic lines for most genes. The hh-related genes are expressedin one or a few of the following tissues: hypodermis, seam cells, excretory duct and pore cells, vulvalepithelial cells, rectal epithelial cells, pharyngeal muscle or marginal cells, arcade cells, support cells ofsensory organs, and neuronal cells. Using time-lapse recordings, we discovered that some hh-related genesare expressed in a cyclical fashion in phase with molting during larval development. We also generatedseveral translational GFP fusions, but they did not show any subcellular localization. In addition, we alsostudied the expression patterns of two genes with similarity to Drosophila frizzled, T23D8.1 andF27E11.3A, and the ortholog of the Drosophila gene dally-like, gpn-1, which is a heparan sulfateproteoglycan. The two frizzled homologs are expressed in a few neurons in the head, and gpn-1 isexpressed in the pharynx. Finally, we compare the efficacy of our GFP expression effort with EST, OSTand SAGE data.Conclusion: No bona-fide Hh signaling pathway is present in C. elegans. Given that the hh-related geneproducts have a predicted signal peptide for secretion, it is possible that they constitute components ofthe extracellular matrix (ECM). They might be associated with the cuticle or be present in soluble form inthe body cavity. They might interact with the Patched or the Patched-related proteins in a manner similarto the interaction of Hedgehog with its receptor Patched.

Document type: 
Article

Improving the Specificity of High-Throughput Ortholog Prediction

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

Background: Orthologs (genes that have diverged after a speciation event) tend to have similarfunction, and so their prediction has become an important component of comparative genomicsand genome annotation. The gold standard phylogenetic analysis approach of comparing availableorganismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis;therefore, ortholog prediction for large genome-scale datasets is typically performed using areciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectlypredict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. Inaddition, there is an increasing interest in identifying orthologs most likely to have retained similarfunction.Results: To address these issues, we present here a high-throughput computational methodnamed Ortholuge that further evaluates previously predicted orthologs (including those predictedusing an RBH-based approach) – identifying which orthologs most closely reflect species divergenceand may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involvingtwo comparison species and an outgroup species, noting cases where relative gene divergence isatypical. It also identifies some cases of gene duplication after species divergence. Throughsimulations of incomplete genome data/gene loss, we show that the vast majority of genes falselypredicted as orthologs by an RBH-based method can be identified. Ortholuge was then used toestimate the number of false-positives (predominantly paralogs) in selected RBH-predictedortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-ratcomparison) and 5% in a bacterial data set (Pseudomonas putida – Pseudomonas syringae speciescomparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs"(supporting-species-divergence-orthologs), were also constructed. These datasets, as well asOrtholuge software that may be used to characterize other species' datasets, are available at http://www.pathogenomics.ca/ortholuge/ (software under GNU General Public License).Conclusion: The Ortholuge method reported here appears to significantly improve the specificity(precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. Thismethod, and its associated software, will aid those performing various comparative genomics-basedanalyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.

Document type: 
Article

Ebbie: Automated Analysis and Storage of Small RNA Cloning Data Using a Dynamic Web Server

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

BACKGROUND:DNA sequencing is used ubiquitously: from deciphering genomes[1] to determining the primary sequence of small RNAs (smRNAs) [2-5]. The cloning of smRNAs is currently the most conventional method to determine the actual sequence of these important regulators of gene expression. Typical smRNA cloning projects involve the sequencing of hundreds to thousands of smRNA clones that are delimited at their 5' and 3' ends by fixed sequence regions. These primers result from the biochemical protocol used to isolate and convert the smRNA into clonable PCR products. Recently we completed a smRNA cloning project involving tobacco plants, where analysis was required for ~700 smRNA sequences[6]. Finding no easily accessible research tool to enter and analyze smRNA sequences we developed Ebbie to assist us with our study.RESULTS:Ebbie is a semi-automated smRNA cloning data processing algorithm, which initially searches for any substring within a DNA sequencing text file, which is flanked by two constant strings. The substring, also termed smRNA or insert, is stored in a MySQL and BlastN database. These inserts are then compared using BlastN to locally installed databases allowing the rapid comparison of the insert to both the growing smRNA database and to other static sequence databases. Our laboratory used Ebbie to analyze scores of DNA sequencing data originating from an smRNA cloning project[6]. Through its built-in instant analysis of all inserts using BlastN, we were able to quickly identify 33 groups of smRNAs from ~700 database entries. This clustering allowed the easy identification of novel and highly expressed clusters of smRNAs. Ebbie is available under GNU GPL and currently implemented on http://bioinformatics.org/ebbie/ webciteCONCLUSION:Ebbie was designed for medium sized smRNA cloning projects with about 1,000 database entries [6-8].Ebbie can be used for any type of sequence analysis where two constant primer regions flank a sequence of interest. The reliable storage of inserts, and their annotation in a MySQL database, BlastN[9] comparison of new inserts to dynamic and static databases make it a powerful new tool in any laboratory using DNA sequencing. Ebbie also prevents manual mistakes during the excision process and speeds up annotation and data-entry. Once the server is installed locally, its access can be restricted to protect sensitive new DNA sequencing data. Ebbie was primarily designed for smRNA cloning projects, but can be applied to a variety of RNA and DNA cloning projects[2,3,10,11].

Document type: 
Article