We address the problem of summarizing an image set with a natural language caption. We present PlacesCap, a new dataset for image set summarization. Our dataset consists of 11,661 image sets with a total of 116,113 images, where each set is summarized by a 3 sentence caption. We propose novel pooling operators for permutation invariant sets of feature maps, and empirically evaluate image set summarization models based on those operators. We also conduct experiments of image set classification and show competitive performance for the proposed set pooling operators.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Mori, Greg
Member of collection