Traditional image compression generally relies on linear transform, and does not have knowledge of the content of the image. The motivations of this thesis are to apply deep learning and computer vision to address these problems in various aspects. Traditional codecs such as JPEG usually have inevitable compression artifacts. In Chapter 4, a stacked multi-context channel-wise attention network is built, where it adaptively integrates features from different scales along the channel dimension to remove JPEG compression artifacts. Experiments show that our model outperforms other methods with low complexity. In Chapter 5, we embed the Trellis Coded Quantizer (TCQ) into a deep learning-based image compression framework. Since the gradient of quantization is zero almost everywhere and the discrete probability distribution is non-differentiable, a soft approximation is applied to backpropagate gradients through the quantizer during training. A base layer that incorporates vision transformers for variable-rate image compression is presented in Chapter 6. Deep image compression models are trained separately for each bit rate which is quite time-consuming. The residual coding is adopted as an enhancement layer to obtain results across a range of bit rates with a single trained model. Our model gets higher or comparable performance compared with other variable-rate learned image compression models. At last, we design learned image compression models and test the reconstructed images with the downstream computer vision tasks. Region-of-interest (ROI) coding methods encode the foreground with better quality than the background. A ROI-based deep image compression model with Swin transformers is proposed in Chapter 7. The binary ROI mask is integrated into different layers of the network to provide spatial information guidance. Our model outperforms non-ROI methods in ROI compression, thus contributing to higher object detection and instance segmentation performance. In Chapter 8, we are interested in jointly optimizing an image compression model and a computer vision model. A novel adaptive bit allocation strategy that combines an object mask and the channel attention module is integrated into the learned image compression architecture. Experiments are conducted on three diverse computer vision tasks and they show good image compression performance and superior computer vision accuracy.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Liang, Jie
Member of collection