Generating natural language summary for image sets

We address the problem of summarizing an image set with a natural language caption. We present PlacesCap, a new dataset for image set summarization. Our dataset consists of 11,661 image sets with a total of 116,113 images, where each set is summarized by a 3 sentence caption. We propose novel pooling operators for permutation invariant sets of feature maps, and empirically evaluate image set summarization models based on those operators. We also conduct experiments of image set classification and show competitive performance for the proposed set pooling operators.
Thesis advisor: Mori, Greg
