Computing Science, School of

Receive updates for this collection

Video Game Telemetry as a Critical Tool in the Study of Complex Skill Learning

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2013
Abstract: 

Cognitive science has long shown interest in expertise, in part because prediction and control of expert development would have immense practical value. Most studies in this area investigate expertise by comparing experts with novices. The reliance on contrastive samples in studies of human expertise only yields deep insight into development where differences are important throughout skill acquisition. This reliance may be pernicious where the predictive importance of variables is not constant across levels of expertise. Before the development of sophisticated machine learning tools for data mining larger samples, and indeed, before such samples were available, it was difficult to test the implicit assumption of static variable importance in expertise development. To investigate if this reliance may have imposed critical restrictions on the understanding of complex skill development, we adopted an alternative method, the online acquisition of telemetry data from a common daily activity for many: video gaming. Using measures of cognitive-motor, attentional, and perceptual processing extracted from game data from 3360 Real-Time Strategy players at 7 different levels of expertise, we identified 12 variables relevant to expertise. We show that the static variable importance assumption is false - the predictive importance of these variables shifted as the levels of expertise increased - and, at least in our dataset, that a contrastive approach would have been misleading. The finding that variable importance is not static across levels of expertise suggests that large, diverse datasets of sustained cognitive-motor performance are crucial for an understanding of expertise in real-world contexts. We also identify plausible cognitive markers of expertise.

Document type: 
Article
File(s): 

Improvement and Performance Evaluation for Multimedia Files Transmission in Vehicle-Based DTNs

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2013
Abstract: 

In recent years, P2P file sharing has been widely embraced and becomes the largest application of the Internet traffic. And the development of automobile industry has promoted a trend of deploying Peer-to-Peer (P2P) networks over vehicle ad hoc networks (VANETs) for mobile content distribution. Due to the high mobility of nodes, nodes’ limited radio transmission range and sparse distribution, VANETs are divided and links are interrupted intermittently. At this moment, VANETs may become Vehicle-based Delay Tolerant Network (VDTNs). Therefore, this work proposes an Optimal Fragmentation-based Multimedia Transmission scheme (OFMT) based on P2P lookup protocol in VDTNs, which can enable multimedia files to be sent to the receiver fast and reliably in wireless mobile P2P networks over VDTNs. In addition, a method of calculating the most suitable size of the fragment is provided, which is tested and verified in the simulation. And we also show that OFMT can defend a certain degree of DoS attack and senders can freely join and leave the wireless mobile P2P network. Simulation results demonstrate that the proposed scheme can significantly improve the performance of the file delivery rate and shorten the file delivery delay compared with the existing schemes.

Document type: 
Article
File(s): 

Barnacle: Detecting and Characterizing Tandem Duplications and Fusions in Transcriptome Assemblies

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2013
Abstract: 

Background

Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers.

Results

We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets.

Conclusions

Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.

Document type: 
Article
File(s): 

Analyzing The Impact Of Social Factors On Homelessness: A Fuzzy Cognitive Map Approach

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2013
Abstract: 

Background

The forces which affect homelessness are complex and often interactive in nature. Social forces such as addictions, family breakdown, and mental illness are compounded by structural forces such as lack of available low-cost housing, poor economic conditions, and insufficient mental health services. Together these factors impact levels of homelessness through their dynamic relations. Historic models, which are static in nature, have only been marginally successful in capturing these relationships.

Methods

Fuzzy Logic (FL) and fuzzy cognitive maps (FCMs) are particularly suited to the modeling of complex social problems, such as homelessness, due to their inherent ability to model intricate, interactive systems often described in vague conceptual terms and then organize them into a specific, concrete form (i.e., the FCM) which can be readily understood by social scientists and others. Using FL we converted information, taken from recently published, peer reviewed articles, for a select group of factors related to homelessness and then calculated the strength of influence (weights) for pairs of factors. We then used these weighted relationships in a FCM to test the effects of increasing or decreasing individual or groups of factors. Results of these trials were explainable according to current empirical knowledge related to homelessness.

Results

Prior graphic maps of homelessness have been of limited use due to the dynamic nature of the concepts related to homelessness. The FCM technique captures greater degrees of dynamism and complexity than static models, allowing relevant concepts to be manipulated and interacted. This, in turn, allows for a much more realistic picture of homelessness. Through network analysis of the FCM we determined that Education exerts the greatest force in the model and hence impacts the dynamism and complexity of a social problem such as homelessness.

Conclusions

The FCM built to model the complex social system of homelessness reasonably represented reality for the sample scenarios created. This confirmed that the model worked and that a search of peer reviewed, academic literature is a reasonable foundation upon which to build the model. Further, it was determined that the direction and strengths of relationships between concepts included in this map are a reasonable approximation of their action in reality. However, dynamic models are not without their limitations and must be acknowledged as inherently exploratory.

Document type: 
Article
File(s): 

deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2011
Abstract: 

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes.

Document type: 
Article
File(s): 

Module Discovery by Exhaustive Search for Densely Connected, Co-Expressed Regions in Biomolecular Interaction Networks

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2010
Abstract: 

Background

Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.

Methodology/Principal Findings

We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.

Conclusion/Significance

We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.

Document type: 
Article
File(s): 

Linearization of Ancestral Multichromosomal Genomes

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2012
Abstract: 

BACKGROUND:Recovering the structure of ancestral genomes can be formalized in terms of properties of binary matrices such as the Consecutive-Ones Property (C1P). The Linearization Problem asks to extract, from a given binary matrix, a maximum weight subset of rows that satisfies such a property. This problem is in general intractable, and in particular if the ancestral genome is expected to contain only linear chromosomes or a unique circular chromosome. In the present work, we consider a relaxation of this problem, which allows ancestral genomes that can contain several chromosomes, each either linear or circular.RESULT:We show that, when restricted to binary matrices of degree two, which correspond to adjacencies, the genomic characters used in most ancestral genome reconstruction methods, this relaxed version of the Linearization Problem is polynomially solvable using a reduction to a matching problem. This result holds in the more general case where columns have bounded multiplicity, which models possibly duplicated ancestral genes. We also prove that for matrices with rows of degrees 2 and 3, without multiplicity and without weights on the rows, the problem is NP-complete, thus tracing sharp tractability boundaries.CONCLUSION:As it happened for the breakpoint median problem, also used in ancestral genome reconstruction, relaxing the definition of a genome turns an intractable problem into a tractable one. The relaxation is adapted to some biological contexts, such as bacterial genomes with several replicons, possibly partially assembled. Algorithms can also be used as heuristics for hard variants. More generally, this work opens a way to better understand linearization results for ancestral genome structure inference.

Document type: 
Article

smyRNA: A Novel Ab Initio ncRNA Gene Finder

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2009-05-05
Abstract: 

Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs.Methodology/Principal FindingsWe present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences.Conclusions/SignificanceOur method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability: http://compbio.cs.sfu.ca/taverna/smyrna

Document type: 
Article
File(s): 

The Generation Challenge Programme Platform: Semantic Standards and Workbench for Crop Science

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2008
Abstract: 

The Generation Challenge programme (GCP) is a global crop research consortium directed toward crop improvement through the application of comparative biology and genetic resources characterization to plant breeding.  A key consortium research activity is the development of a GCP crop bioinformatics platform to support GCP research. This platform includes the following: (i) shared, public platform-independent domain models, ontology, and data formats to enable interoperability of data and analysis flows within the platform; (ii) web service and registry technologies to identify, share, and integrate information across diverse, globally dispersed data sources, as well as to access high-performance computational (HPC) facilities for computationally intensive,  high-throughput analyses of project data; (iii) platform-specific middleware reference implementations of the domain model integrating a suite of public (largely open-access/-source) databases and software tools into a workbench to facilitate biodiversity analysis, comparative analysis of crop genomic data, and plant breeding decision making.

Document type: 
Article

Conditional Random Fields and Supervised Learning in Automated Skin Lesion Diagnosis

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2011
Abstract: 

Many subproblems in automated skin lesion diagnosis (ASLD) canbe unified under a single generalization of assigning a label, from an predefinedset, to each pixel in an image. We first formalize this generalizationand then present two probabilistic models capable of solving it. The firstmodel is based on independent pixel labeling using maximum a-posteriori(MAP) estimation. The second model is based on conditional randomfields (CRFs), where dependencies between pixels are defined using agraph structure. Furthermore, we demonstrate how supervised learningand an appropriate training set can be used to automatically determineall model parameters. We evaluate both models' ability to segment achallenging dataset consisting of 116 images and compare our results to5 previously published methods.

Document type: 
Article