Khani, Aliasghar

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2024-04-12

Authors/Contributors

Author: Khani, Aliasghar

Abstract

Deep learning has revolutionized computer vision, specifically image classification and segmentation, through the use of over-parameterized models. However, the utilization of over-parameterized models presents its own challenges. A fundamental issue when using large models for image classification is the risk of over-fitting to spurious features, rather than learning meaningful data representations. Segmentation, on the other hand, faces the challenge of requiring dense annotations, which are both costly and difficult to obtain. In this thesis, we propose two solutions, each addressing one of these challenges. First, we introduce a masking strategy named MaskTune, designed to mitigate the over-reliance of classification models on spurious features. This strategy forces the model to explore new features during fine-tuning through strategic masking. Second, we present SLiMe, a segmentation approach that frames the problem as a one-shot optimization task. By leveraging a novel concept called the weighted accumulated self-attention map and cross-attention map from the UNet of text-conditioned Stable Diffusion (SD), we optimize text embeddings to highlight areas corresponding to segmentation mask foregrounds, enabling segmentation of unseen images. Furthermore, utilizing additional annotated data, especially in a few-shot scenario, enhances SLiMe's performance. Our evaluations on various datasets, including CelebA, Waterbirds, ImagenNet-9, CIFAR-10, SVHN, PASCAL-Part, and CelebAMask-HQ, demonstrate the superiority of both MaskTune and SLiMe compared to state-of-the-art methods.

Extent

67 pages.

Keywords

Identifier

etd22997

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Hamarneh, Ghassan

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22997.pdf	35.23 MB

Mitigating spurious correlations and enhancing one-shot image segmentation

Keywords

Views & downloads - as of June 2023