Visual saliency is the propensity of a part of the scene to attract attention. Computational modeling of visual saliency has become an important research problem in recent years, with applications in quality assessment, compression, object tracking, and so on. While most saliency estimation models for dynamic scenes operate on raw video, their high computational complexity is a serious drawback when it comes to practical applications. Our approach for decreasing the complexity and memory requirements is to avoid decoding the compressed bitstream as much as possible. Since most modern cameras incorporate video encoders, this paves the way for in-camera saliency estimation, which could be useful in a variety of computer vision applications. In this dissertation we present compressed-domain features that are highly indicative of saliency in natural video. Using these features, we construct two simple and eﬀective saliency estimation models for compressed video. The proposed models have been extensively tested on two ground truth datasets using several accuracy metrics, and shown to yield considerable improvement over several state-of-the-art compressed-domain and pixel-domain saliency models. Another contribution is a tracking algorithm that also uses only compressed-domain information to isolate moving regions and estimate their trajectories. The algorithm has been tested on a number of standard sequences, and the results demonstrate its advantages over state-of-the-art for compressed-domain tracking and segmentation, with over 30% improvement in F-measure.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Bajic, Ivan V.
Member of collection