Ruan, Yue

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2022-04-22

Authors/Contributors

Author: Ruan, Yue

Abstract

The thesis focuses on applying contrastive loss in learning joint embeddings over multimodal data and proves the effectiveness at a downstream task (retrieval). Previous work on joint representation learning for 3D shapes and text has mostly focused on improving embeddings through modeling of complex attention between representations, or multi-task learning. We show that with large batch contrastive learning we achieve SoTA on text-to-shape retrieval without complex attention mechanisms or losses. Prior work in 3D and text representations has also focused on bimodal representation learning using either voxels or multi-view images with text. We show that a trimodal learning scheme can lead to even higher performance and better representations for all modalities.

Extent

38 pages.

Keywords

Identifier

etd21930

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Chang, Angel

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd21930.pdf	7.16 MB

TriCoLo: Trimodal contrastive loss for text to shape retrieval

Keywords

Views & downloads - as of June 2023