The image semantic segmentation challenge consists of classifying each pixel of an image (or just several ones) into an instance, where each instance (or category) corresponds to an object. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. Following a comprehensive review of state-of-the-art deep learning-based medical and non-medical image segmentation solutions, we make the following contributions. A deep learning-based (medical) image segmentation typical pipeline includes designing layers (A), designing an architecture (B), and defining a loss function (C). A clean/modified (D)/adversarialy perturbed (E) image is fed into a model (consisting of layers and loss function) to predict a segmentation mask for scene understanding etc. In some cases where the number of segmentation annotations is limited, weakly supervised approaches (F) are leverages. For some applications where further analysis is needed e.g., predicting volumes and objects burden, the segmentation mask is fed into another post-processing step (G). In this thesis, we tackle each of the steps (A-G). I) As for step (A and E), we studied the effect of the adversarial perturbation on image segmentation models and proposed a method that improves the segmentation performance via a non-linear radial basis convolutional feature mapping by learning a Mahalanobis-like distance function on both adversarially perturbed and unperturbed images. Our method then maps the convolutional features onto a linearly well-separated manifold, which prevents small adversarial perturbations from forcing a sample to cross the decision boundary. II) As for step (B), we propose light, learnable skip connections which learn first to select the most discriminative channels and then aggregate the selected ones as single-channel attending to the most discriminative regions of input. Compared to the heavy classical skip connections, our method reduces the computation cost and memory usage while it improves segmentation performance. III) As for step (C), we examined the critical choice of a loss function in order to handle the notorious imbalance problem that plagues both the input and output of a learning model. In order to tackle both types of imbalance during training and inference, we introduce a new curriculum learning-based loss function. Specifically, we leverage the Dice similarity coefficient to deter model parameters from being held at bad local minima and at the same time, gradually learn better model parameters by penalizing for false positives/negatives using a cross-entropy term which also helps. IV) As for step (D), we propose a new segmentation performance-boosting paradigm that relies on optimally modifying the network's input instead of the network itself. In particular, we leverage the gradients of a trained segmentation network with respect to the input to transfer it into a space where the segmentation accuracy improves. V) As for step (F), we propose a weakly supervised image segmentation model with a learned spatial masking mechanism to filter out irrelevant background signals from attention maps. The proposed method minimizes mutual information between a masked variational representation and the input while maximizing the information between the masked representation and class labels. VI) Although many semi-automatic segmentation based methods have been developed, as for step (G), we introduce a method that completely eliminates the segmentation step and directly estimates the volume and activity of the lesions from positron emission tomography scans.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Hamarneh, Ghassan
Member of collection