Molecular Biology and Biochemistry, Department of

Receive updates for this collection

Comprehensive Analysis of Gene Expression Patterns of Hedgehog-Related Genes

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

Background: The Caenorhabditis elegans genome encodes ten proteins that share sequence similarity withthe Hedgehog signaling molecule through their C-terminal autoprocessing Hint/Hog domain. Theseproteins contain novel N-terminal domains, and C. elegans encodes dozens of additional proteinscontaining only these N-terminal domains. These gene families are called warthog, groundhog, ground-likeand quahog, collectively called hedgehog (hh)-related genes. Previously, the expression pattern of seventeengenes was examined, which showed that they are primarily expressed in the ectoderm.Results: With the completion of the C. elegans genome sequence in November 2002, we reexamined andidentified 61 hh-related ORFs. Further, we identified 49 hh-related ORFs in C. briggsae. ORF analysisrevealed that 30% of the genes still had errors in their predictions and we improved these predictions here.We performed a comprehensive expression analysis using GFP fusions of the putative intergenicregulatory sequence with one or two transgenic lines for most genes. The hh-related genes are expressedin one or a few of the following tissues: hypodermis, seam cells, excretory duct and pore cells, vulvalepithelial cells, rectal epithelial cells, pharyngeal muscle or marginal cells, arcade cells, support cells ofsensory organs, and neuronal cells. Using time-lapse recordings, we discovered that some hh-related genesare expressed in a cyclical fashion in phase with molting during larval development. We also generatedseveral translational GFP fusions, but they did not show any subcellular localization. In addition, we alsostudied the expression patterns of two genes with similarity to Drosophila frizzled, T23D8.1 andF27E11.3A, and the ortholog of the Drosophila gene dally-like, gpn-1, which is a heparan sulfateproteoglycan. The two frizzled homologs are expressed in a few neurons in the head, and gpn-1 isexpressed in the pharynx. Finally, we compare the efficacy of our GFP expression effort with EST, OSTand SAGE data.Conclusion: No bona-fide Hh signaling pathway is present in C. elegans. Given that the hh-related geneproducts have a predicted signal peptide for secretion, it is possible that they constitute components ofthe extracellular matrix (ECM). They might be associated with the cuticle or be present in soluble form inthe body cavity. They might interact with the Patched or the Patched-related proteins in a manner similarto the interaction of Hedgehog with its receptor Patched.

Document type: 
Article

Improving the Specificity of High-Throughput Ortholog Prediction

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

Background: Orthologs (genes that have diverged after a speciation event) tend to have similarfunction, and so their prediction has become an important component of comparative genomicsand genome annotation. The gold standard phylogenetic analysis approach of comparing availableorganismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis;therefore, ortholog prediction for large genome-scale datasets is typically performed using areciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectlypredict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. Inaddition, there is an increasing interest in identifying orthologs most likely to have retained similarfunction.Results: To address these issues, we present here a high-throughput computational methodnamed Ortholuge that further evaluates previously predicted orthologs (including those predictedusing an RBH-based approach) – identifying which orthologs most closely reflect species divergenceand may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involvingtwo comparison species and an outgroup species, noting cases where relative gene divergence isatypical. It also identifies some cases of gene duplication after species divergence. Throughsimulations of incomplete genome data/gene loss, we show that the vast majority of genes falselypredicted as orthologs by an RBH-based method can be identified. Ortholuge was then used toestimate the number of false-positives (predominantly paralogs) in selected RBH-predictedortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-ratcomparison) and 5% in a bacterial data set (Pseudomonas putida – Pseudomonas syringae speciescomparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs"(supporting-species-divergence-orthologs), were also constructed. These datasets, as well asOrtholuge software that may be used to characterize other species' datasets, are available at http://www.pathogenomics.ca/ortholuge/ (software under GNU General Public License).Conclusion: The Ortholuge method reported here appears to significantly improve the specificity(precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. Thismethod, and its associated software, will aid those performing various comparative genomics-basedanalyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.

Document type: 
Article

Ebbie: Automated Analysis and Storage of Small RNA Cloning Data Using a Dynamic Web Server

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

BACKGROUND:DNA sequencing is used ubiquitously: from deciphering genomes[1] to determining the primary sequence of small RNAs (smRNAs) [2-5]. The cloning of smRNAs is currently the most conventional method to determine the actual sequence of these important regulators of gene expression. Typical smRNA cloning projects involve the sequencing of hundreds to thousands of smRNA clones that are delimited at their 5' and 3' ends by fixed sequence regions. These primers result from the biochemical protocol used to isolate and convert the smRNA into clonable PCR products. Recently we completed a smRNA cloning project involving tobacco plants, where analysis was required for ~700 smRNA sequences[6]. Finding no easily accessible research tool to enter and analyze smRNA sequences we developed Ebbie to assist us with our study.RESULTS:Ebbie is a semi-automated smRNA cloning data processing algorithm, which initially searches for any substring within a DNA sequencing text file, which is flanked by two constant strings. The substring, also termed smRNA or insert, is stored in a MySQL and BlastN database. These inserts are then compared using BlastN to locally installed databases allowing the rapid comparison of the insert to both the growing smRNA database and to other static sequence databases. Our laboratory used Ebbie to analyze scores of DNA sequencing data originating from an smRNA cloning project[6]. Through its built-in instant analysis of all inserts using BlastN, we were able to quickly identify 33 groups of smRNAs from ~700 database entries. This clustering allowed the easy identification of novel and highly expressed clusters of smRNAs. Ebbie is available under GNU GPL and currently implemented on http://bioinformatics.org/ebbie/ webciteCONCLUSION:Ebbie was designed for medium sized smRNA cloning projects with about 1,000 database entries [6-8].Ebbie can be used for any type of sequence analysis where two constant primer regions flank a sequence of interest. The reliable storage of inserts, and their annotation in a MySQL database, BlastN[9] comparison of new inserts to dynamic and static databases make it a powerful new tool in any laboratory using DNA sequencing. Ebbie also prevents manual mistakes during the excision process and speeds up annotation and data-entry. Once the server is installed locally, its access can be restricted to protect sensitive new DNA sequencing data. Ebbie was primarily designed for smRNA cloning projects, but can be applied to a variety of RNA and DNA cloning projects[2,3,10,11].

Document type: 
Article

Evidence of Balanced Diversity at the Chicken Interleukin 4 Receptor Alpha Chain Locus

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2009
Abstract: 

Background: The comparative analysis of genome sequences emerging for several avian species with thefully sequenced chicken genome enables the genome-wide investigation of selective processes infunctionally important chicken genes. In particular, because of pathogenic challenges it is expected thatgenes involved in the chicken immune system are subject to particularly strong adaptive pressure.Signatures of selection detected by inter-species comparison may then be investigated at the populationlevel in global chicken populations to highlight potentially relevant functional polymorphisms.Results: Comparative evolutionary analysis of chicken (Gallus gallus) and zebra finch (Taeniopygia guttata)genes identified interleukin 4 receptor alpha-chain (IL-4Rα), a key cytokine receptor as a candidate with asignificant excess of substitutions at nonsynonymous sites, suggestive of adaptive evolution. Resequencingand detailed population genetic analysis of this gene in diverse village chickens from Asia and Africa,commercial broilers, and in outgroup species red jungle fowl (JF), grey JF, Ceylon JF, green JF, grey francolinand bamboo partridge, suggested elevated and balanced diversity across all populations at this gene, actingto preserve different high-frequency alleles at two nonsynonymous sites.Conclusion: Haplotype networks indicate that red JF is the primary contributor of diversity at chickenIL-4Rα: the signature of variation observed here may be due to the effects of domestication, admixtureand introgression, which produce high diversity. However, this gene is a key cytokine-binding receptor inthe immune system, so balancing selection related to the host response to pathogens cannot be excluded.

Document type: 
Article

Bursts and Horizontal Evolution of DNA Transposons in the Speciation of Pseudotetraploid Salmonids

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2007
Abstract: 

Background: Several genome duplications have occurred in the evolutionary history of teleostfish. In returning to a stable diploid state, the polyploid genome reorganized, and large portions arelost, while the fish lines evolved to numerous species. Large scale transposon movement has beenpostulated to play an important role in the genome reorganization process. We analyzed the DNAsequence of several large loci in Salmo salar and other species for the presence of DNA transposonfamilies.Results: We have identified bursts of activity of 14 families of DNA transposons (12 Tc1-like and2 piggyBac-like families, including 11 novel ones) in genome sequences of Salmo salar. Several ofthese families have similar sequences in a number of closely and distantly related fish, lamprey, andfrog species as well as in the parasite Schistosoma japonicum. Analysis of sequence similaritiesbetween copies within the families of these bursts demonstrates several waves of transpositionactivities coinciding with salmonid species divergence. Tc1-like families show a master gene-likecopying process, illustrated by extensive but short burst of copying activity, while the piggyBac-likefamilies show a more random copying pattern. Recent families may include copies with an openreading frame for an active transposase enzyme.Conclusion: We have identified defined bursts of transposon activity that make use of masterslaveand random mechanisms. The bursts occur well after hypothesized polyploidy events andcoincide with speciation events. Parasite-mediated lateral transfer of transposons are implicated.

Document type: 
Article

Sequencing the Genome of the Atlantic Salmon (Salmo Salar)

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2010
Abstract: 

The International Collaboration to Sequence theAtlantic Salmon Genome (ICSASG) will produce agenome sequence that identifies and physically mapsall genes in the Atlantic salmon genome and acts as areference sequence for other salmonids.

Document type: 
Article

Distribution of Ancestral Proto-Actinopterygian Chromosome Arms within the Genomes of 4R-Derivative Salmonid Fishes (Rainbow Trout and Atlantic Salmon)

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2008
Abstract: 

Background: Comparative genomic studies suggest that the modern day assemblage of ray-finnedfishes have descended from an ancestral grouping of fishes that possessed 12–13 linkage groups. Alljawed vertebrates are postulated to have experienced two whole genome duplications (WGD) intheir ancestry (2R duplication). Salmonids have experienced one additional WGD (4R duplicationevent) compared to most extant teleosts which underwent a further 3R WGD compared to othervertebrates. We describe the organization of the 4R chromosomal segments of the proto-rayfinnedfish karyotype in Atlantic salmon and rainbow trout based upon their comparative syntenieswith two model species of 3R ray-finned fishes.Results: Evidence is presented for the retention of large whole-arm affinities between theancestral linkage groups of the ray-finned fishes, and the 50 homeologous chromosomal segmentsin Atlantic salmon and rainbow trout. In the comparisons between the two salmonid species, thereis also evidence for the retention of large whole-arm homeologous affinities that are associatedwith the retention of duplicated markers. Five of the 7 pairs of chromosomal arm regionsexpressing the highest level of duplicate gene expression in rainbow trout share homologoussynteny to the 5 pairs of homeologs with the greatest duplicate gene expression in Atlantic salmon.These regions are derived from proto-Actinopterygian linkage groups B, C, E, J and K.Conclusion: Two chromosome arms in Danio rerio and Oryzias latipes (descendants of the 3Rduplication) can, in most instances be related to at least 4 whole or partial chromosomal arms in the salmonid species. Multiple arm assignments in the two salmonid species do not clearly supporta 13 proto-linkage group model, and suggest that a 12 proto-linkage group arrangement (i.e., aseparate single chromosome duplication and ancestral fusion/fissions/recombination within theputative G/H/I groupings) may have occurred in the more basal soft-rayed fishes. We also foundevidence supporting the model that ancestral linkage group M underwent a single chromosomeduplication following the 3R duplication. In the salmonids, the M ancestral linkage groups arelocalized to 5 whole arm, and 3 partial arm regions (i.e., 6 whole arm regions expected). Thus, 3distinct ancestral linkage groups are postulated to have existed in the G/H and M lineagechromosomes in the ancestor of the salmonids.

Document type: 
Article

Convergent Evolution of RFX Transcription Factors and Ciliary Genes Predated the Origin of Metazoans

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2010
Abstract: 

Background: Intraflagellar transport (IFT) genes, which are critical for the development and function of cilia andflagella in metazoans, are tightly regulated by the Regulatory Factor X (RFX) transcription factors (TFs). However, howand when their evolutionary relationship was established remains unknown.Results: We have identified evidence suggesting that RFX TFs and IFT genes evolved independently and theirevolution converged before the first appearance of metazoans. Both ciliary genes and RFX TFs exist in all metazoans aswell as some unicellular eukaryotes. However, while RFX TFs and IFT genes are found simultaneously in all sequencedmetazoan genomes, RFX TFs do not co-exist with IFT genes in most pre-metazoans and thus do not regulate them inthese organisms. For example, neither the budding yeast nor the fission yeast possesses cilia although both have welldefinedRFX TFs. Conversely, most unicellular eukaryotes, including the green alga Chlamydomonas reinhardtii, havetypical cilia and well conserved IFT genes but lack RFX TFs. Outside of metazoans, RFX TFs and IFT genes co-exist onlyin choanoflagellates including M. brevicollis, and only one fungus Allomyces macrogynus of the 51 sequenced fungusgenomes. M. brevicollis has two putative RFX genes and a full complement of ciliary genes.Conclusions: The evolution of RFX TFs and IFT genes were independent in pre-metazoans. We propose that theirconvergence in evolution, or the acquired transcriptional regulation of IFT genes by RFX TFs, played a pivotal role in theestablishment of metazoan.

Document type: 
Article

Structural Characterization of Genomes by Large Scale Sequence-Structure Threading: Application of Reliability Analysis in Structural Genomics

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2004
Abstract: 

Background: We establish that the occurrence of protein folds among genomes can be accuratelydescribed with a Weibull function. Systems which exhibit Weibull character can be interpretedwith reliability theory commonly used in engineering analysis. For instance, Weibull distributionsare widely used in reliability, maintainability and safety work to model time-to-failure of mechanicaldevices, mechanisms, building constructions and equipment.Results: We have found that the Weibull function describes protein fold distribution within andamong genomes more accurately than conventional power functions which have been used in anumber of structural genomic studies reported to date.It has also been found that the Weibull reliability parameter β for protein fold distributions variesbetween genomes and may reflect differences in rates of gene duplication in evolutionary historyof organisms.Conclusions: The results of this work demonstrate that reliability analysis can provide usefulinsights and testable predictions in the fields of comparative and structural genomics.

Document type: 
Article

Identification of Ciliary and Ciliopathy Genes in Caenorhabditis Elegans through Comparative Genomics

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2006
Abstract: 

Background: The recent availability of genome sequences of multiple related Caenorhabditis species hasmade it possible to identify, using comparative genomics, similarly transcribed genes in Caenorhabditiselegans and its sister species. Taking this approach, we have identified numerous novel ciliary genes in C.elegans, some of which may be orthologs of unidentified human ciliopathy genes.Results: By screening for genes possessing canonical X-box sequences in promoters of threeCaenorhabditis species, namely C. elegans, C. briggsae and C. remanei, we identified 93 genes (including knownX-box regulated genes) that encode putative components of ciliated neurons in C. elegans and are subjectto the same regulatory control. For many of these genes, restricted anatomical expression in ciliated cellswas confirmed, and control of transcription by the ciliogenic DAF-19 RFX transcription factor wasdemonstrated by comparative transcriptional profiling of different tissue types and of daf-19(+) and daf-19(-) animals. Finally, we demonstrate that the dye-filling defect of dyf-5(mn400) animals, which is indicativeof compromised exposure of cilia to the environment, is caused by a nonsense mutation in the serine/threonine protein kinase gene M04C9.5.Conclusion: Our comparative genomics-based predictions may be useful for identifying genes involved inhuman ciliopathies, including Bardet-Biedl Syndrome (BBS), since the C. elegans orthologs of known humanBBS genes contain X-box motifs and are required for normal dye filling in C. elegans ciliated neurons.

Document type: 
Article