Discriminative latent variable models for visual recognition

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Visual Recognition is a central problem in computer vision, and it has numerous potential applications in many di erent elds, such as robotics, human computer interaction, and entertainment. In this dissertation, we propose two discriminative latent variable models for handling challenging visual recognition problems. In particular, we use latent variables to capture and model various prior knowledge in the training data. In the rst model, we address the problem of recognizing human actions from still images. We jointly consider both poses and actions in a uni ed framework, and treat human poses as latent variables. The learning of this model follows the framework of latent SVM. Secondly, we propose another latent variable model to address the problem of automated tag learning on YouTube videos. In particular, we address the semantic variations (sub-tags) of the videos which have the same tag. In the model, each video is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. This model is trained using a latent learning framework based on LogitBoost, which jointly considers both the latent sub-tag label and the tag label. Moreover, we propose a novel discriminative latent learning framework, kernel latent SVM, which combines the bene t of latent SVM and kernel methods. The framework of kernel latent SVM is general enough to be applied in many applications of visual recognition. It is also able to handle complex latent variables with interdependent structures using composite kernels.
Copyright statement
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Mori, Greg
Member of collection
Attachment Size
etd7478_WYang.pdf 18.58 MB