Learning functional dependencies (mapping) between arbitrary input and output spaces is one of the main challenges in computational intelligence. There have been two main threads in the literature for solving this problem -- one focusing on designing more discriminative representation of the input and another one focusing on designing flexible mapping functions.Interestingly, for many applications, the outputs follow a structure, which can be exploited to narrow down the space of possible (most likely) outputs and consequently boost the overall mapping performance. Applications with this property include object detection (computer vision), object category segmentation (computer vision), parsing (natural language processing), etc. Current algorithms for learning the parameters of the model in structured prediction iteratively find the most confusing output configuration -- the configuration that receives high score according to the model, but is very different from the ground truth output -- and update the model parameters to suppress its score. Here, finding the most confusing configuration is the most expensive procedure in learning. In this thesis we propose two algorithms for approximately finding the most confusing configuration when the model is a Markov network. Each algorithm works for a large group of non-decomposable performance measures that arise in many real-world applications. We first design a baseline that achieves state-of-the-art results in our main application of object category segmentation on person class by introducing fine and coarse clothing texture cues as a set of new features. Then, we propose our first algorithm that approximates the non-decomposable loss function in false positive and false negative space with a piecewise planar function and finds the most confusing output in each piece. Our second proposed algorithm decomposes the dual of the objective into a supermodular Markov random field and the loss function augmented with a linear term -- both being efficient to optimize. We empirically show the superiority of the two proposed algorithms over our baseline and another strong baseline -- both used widely in the literature -- on two main applications, object category segmentation (on PASCAL VOC 2009 and 2010 and H3D datasets) and action retrieval (on our nursing home dataset).
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Member of collection