Resource type
Thesis type
(Thesis) M.Sc.
Date created
2022-04-22
Authors/Contributors
Author: Ruan, Yue
Abstract
The thesis focuses on applying contrastive loss in learning joint embeddings over multimodal data and proves the effectiveness at a downstream task (retrieval). Previous work on joint representation learning for 3D shapes and text has mostly focused on improving embeddings through modeling of complex attention between representations, or multi-task learning. We show that with large batch contrastive learning we achieve SoTA on text-to-shape retrieval without complex attention mechanisms or losses. Prior work in 3D and text representations has also focused on bimodal representation learning using either voxels or multi-view images with text. We show that a trimodal learning scheme can lead to even higher performance and better representations for all modalities.
Document
Extent
38 pages.
Identifier
etd21930
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Chang, Angel
Language
English
Member of collection
Download file | Size |
---|---|
etd21930.pdf | 7.16 MB |