Resource type
Thesis type
(Thesis) Ph.D.
Date created
2024-04-24
Authors/Contributors
Author: Cardoen, Ben
Abstract
Multichannel superresolution fluorescence microscopy (SRM) enables scientific discovery of biological interactions with up to nanometer precision. Reconstructing those interactions provides a new understanding of cellular function and mechanisms of degenerative and infectious diseases. This thesis introduces novel, adaptive, and scalable algorithms that reconstruct SRM interaction with a precision higher than that of state-of-the-art superresolution microscopes. A subset of SRM, single-molecule localization microscopy (SMLM) data comes in the form of point cloud localizations. To study interaction between two
SMLM point clouds, we first introduce a vector field-based unsupervised interaction analysis method, SIZLE, which can identify protein structures by their interaction alone. Since SMLM localization suffers from imbalanced reconstruction due to highly variable density, in our second contribution, we introduce a novel Pareto optimal graph tiling method, ERGO, that leverages a recurrent neural network (LSTM) for unbiased reconstruction of protein localizations in SMLM. Analyzing nanometer-precision interaction in voxel-based SRM requires adaptive subcellular structure detection. To this end, we introduce a novel weakly supervised object detection and identification method, SPECHT, designed to work robustly across conditions, signal-to-noise, and channels, where each channel captures a distinctly labelled protein. SPECHT extends belief theory to leverage the uncertainty per object into a measure of conflict, allowing models to be combined across data sets without retraining. As the precision of the microscope does not always allow for segmentation of objects, in our fourth contribution, we introduce a novel probabilistic segmentation-free membrane contact detection algorithm (MCS-DETECT) in 3D STED. The algorithm reconstructs variable-width contacts between organelle membranes with a precision higher than that typically allowed by the microscope, without requiring ground truth, while proving to be consistent across multiple cell lines and genomic conditions. When analysing the attributes of the thousands of interacting structures per cell, we noticed how the commonly used log transform can lead to an unexpected reversal of conclusions. We name this novel statistical phenomenon the Log-Paradox, and prove that it can occur in datasets with long-tailed distributions. We derive the conditions for the Log-Paradox and present a heuristic that maximizes it. Finally, as the above computational biology studies require careful curation and validation of large heterogeneous datasets, which can be a tremendous manual effort, we introduced a novel software framework, DataCurator, that translates human-readable code-free recipes into scalable executable code to curate, validate, and process large heterogeneous biomedical datasets, using existing or new code in Julia, R, or Python.
SMLM point clouds, we first introduce a vector field-based unsupervised interaction analysis method, SIZLE, which can identify protein structures by their interaction alone. Since SMLM localization suffers from imbalanced reconstruction due to highly variable density, in our second contribution, we introduce a novel Pareto optimal graph tiling method, ERGO, that leverages a recurrent neural network (LSTM) for unbiased reconstruction of protein localizations in SMLM. Analyzing nanometer-precision interaction in voxel-based SRM requires adaptive subcellular structure detection. To this end, we introduce a novel weakly supervised object detection and identification method, SPECHT, designed to work robustly across conditions, signal-to-noise, and channels, where each channel captures a distinctly labelled protein. SPECHT extends belief theory to leverage the uncertainty per object into a measure of conflict, allowing models to be combined across data sets without retraining. As the precision of the microscope does not always allow for segmentation of objects, in our fourth contribution, we introduce a novel probabilistic segmentation-free membrane contact detection algorithm (MCS-DETECT) in 3D STED. The algorithm reconstructs variable-width contacts between organelle membranes with a precision higher than that typically allowed by the microscope, without requiring ground truth, while proving to be consistent across multiple cell lines and genomic conditions. When analysing the attributes of the thousands of interacting structures per cell, we noticed how the commonly used log transform can lead to an unexpected reversal of conclusions. We name this novel statistical phenomenon the Log-Paradox, and prove that it can occur in datasets with long-tailed distributions. We derive the conditions for the Log-Paradox and present a heuristic that maximizes it. Finally, as the above computational biology studies require careful curation and validation of large heterogeneous datasets, which can be a tremendous manual effort, we introduced a novel software framework, DataCurator, that translates human-readable code-free recipes into scalable executable code to curate, validate, and process large heterogeneous biomedical datasets, using existing or new code in Julia, R, or Python.
Document
Extent
290 pages.
Identifier
etd23034
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Hamarneh, Ghassan
Language
English
Member of collection
Download file | Size |
---|---|
etd23034.pdf | 140.83 MB |