Machine learning has been successfully used to solve a variety of tasks in the domains of computer vision, language translation, and video games. However, these domains in which success has been greatest often have access to large amounts of labeled data. There has been great promise for machine learning to make progress in the life sciences in areas such as better patient diagnosis and drug response prediction but these domains commonly suffer from a lack of labeled data necessary for training an accurate model. Obtaining new labeled data often requires costly experiments or may be impossible in the case of patient data, therefore we must get the most out of the limited data that exists. Some strategies such as active learning, semi-supervised learning, and interpretable models aim to overcome the issue of a lack of labeled data by leveraging unlabeled data or domain knowledge. In this thesis, I present three works: Firstly, I present work on predicting mutations that cause resistance to prostate cancer treatments using a deep neural network with minimal parameters and simulated data to overcome label imbalance. Secondly, I present work on an interpretable deep neural network to predict response to cancer drug treatments using biological domain knowledge to form the architecture of the model. And lastly, I present work on a deep ensemble approach that dynamically trades off between active and semi-supervised learning using expected calibration error. Together, these works highlight different strategies and contribute novel approaches for learning accurate machine learning models with few labeled data.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Ester, Martin
Member of collection