Predictive models for chromatin folding: connecting sequence to structure

Date created: 
Bayesian statistics,
Maximum-entropy modelling
Artificial neural networks.

The DNA packaged inside a nucleus shows complex structures stabilized by a host of DNA-bound factors. This combination of DNA and bound factors is known as chromatin. Both the distribution of bound factors and the contacts between different locations of the DNA can be now measured on a genome-wide scale. Nevertheless, to what extent is the likelihood of contact between sites in the genome encoded by the spatial sequence of bound factors? Current approaches at addressing this question primarily use simulations of heterogeneous polymers to generate structures using the locations of bound factors. In contrast, here we develop novel predictive models for connecting chromatin sequence to structure using statistical physics, information theory and machine learning. Since our methods do not require costly polymer simulations they can quickly predict the effect on structure due to changes in the distribution of bound factors. In addition, our methods are formulated in a manner that allows us to solve the inverse problem: namely, given just structural data, predict the likely sequence of bound factors. We show that the models developed can make biologically meaningful predictions, highlighting key features of the mechanisms through which the three-dimensional conformation of DNA is coordinated by the interactions between DNA-bound factors.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Senior supervisor: 
Eldon Emberly
Science: Department of Physics
Thesis type: 
(Thesis) Ph.D.