Skip navigation

Spatial-temporal autoencoder with attention network for video compression

Spatial-temporal autoencoder with attention network for video compression

Sigger, Neetu, Al-Jawed, Naseer and Nguyen, Tuan ORCID logoORCID: https://orcid.org/0000-0003-0055-8218 (2022) Spatial-temporal autoencoder with attention network for video compression. In: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Online. February 6-8, 2022. Computer Vision Theory and Applications (VISAPP), 4 . Computer Vision Theory and Applications (VISAPP) - SCITEPRESS Digital Library, Setúbal, Portugal, pp. 364-371. ISBN 978-9897585555; ISSN 2184-4321 (doi:10.5220/0010811900003124)

[thumbnail of AAM]
Preview
PDF (AAM)
36020_NGUYEN_Spatial_temporal_autoencoder.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (499kB) | Preview

Abstract

Deep learning-based approaches are now state of the art in numerous tasks, including video compression, and are having a revolutionary influence in video processing. Recently, learned video compression methods exhibit a fast development trend with promising results. In this paper, taking advantage of the powerful non-linear representation ability of neural networks, we replace each standard component of video compression with a neural network. We propose a spatial-temporal video compression network (STVC) using the spatial-temporal priors with an attention module (STPA). On the one hand, joint spatial-temporal priors are used for generating latent representations and reconstructing compressed outputs because efficient temporal and spatial information representation plays a crucial role in video coding. On the other hand, we also added an efficient and effective Attention module such that the model pays more effort on restoring the artifact-rich areas. Moreover, we formalize the rate-distortion optimization into a single loss function, in which the network learns to leverage the Spatial-temporal redundancy presented in the frames and decreases the bit rate while maintaining visual quality in the decoded frames. The experiment results show that our approach delivers the state-of-the-art learning video compression performance in terms of MS-SSIM and PSNR.

Item Type: Conference Proceedings
Title of Proceedings: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Online. February 6-8, 2022.
Uncontrolled Keywords: video compression; deep learning; auto-encoder; rate-distortion optimization; attention mechanism
Subjects: N Fine Arts > N Visual arts (General) For photography, see TR
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Faculty / School / Research Centre / Research Group: Faculty of Engineering & Science
Faculty of Engineering & Science > School of Computing & Mathematical Sciences (CMS)
Last Modified: 19 May 2022 08:38
URI: http://gala.gre.ac.uk/id/eprint/36020

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics