Snow, Oliver

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2022-08-24

Authors/Contributors

Author: Snow, Oliver

Abstract

Machine learning has been successfully used to solve a variety of tasks in the domains of computer vision, language translation, and video games. However, these domains in which success has been greatest often have access to large amounts of labeled data. There has been great promise for machine learning to make progress in the life sciences in areas such as better patient diagnosis and drug response prediction but these domains commonly suffer from a lack of labeled data necessary for training an accurate model. Obtaining new labeled data often requires costly experiments or may be impossible in the case of patient data, therefore we must get the most out of the limited data that exists. Some strategies such as active learning, semi-supervised learning, and interpretable models aim to overcome the issue of a lack of labeled data by leveraging unlabeled data or domain knowledge. In this thesis, I present three works: Firstly, I present work on predicting mutations that cause resistance to prostate cancer treatments using a deep neural network with minimal parameters and simulated data to overcome label imbalance. Secondly, I present work on an interpretable deep neural network to predict response to cancer drug treatments using biological domain knowledge to form the architecture of the model. And lastly, I present work on a deep ensemble approach that dynamically trades off between active and semi-supervised learning using expected calibration error. Together, these works highlight different strategies and contribute novel approaches for learning accurate machine learning models with few labeled data.

Extent

104 pages.

Keywords

Identifier

etd22340

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Ester, Martin

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22340.pdf	14.38 MB

Interactive machine learning for scarce molecular datasets

Keywords

Views & downloads - as of June 2023