TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution

Jiang, Qin; Wang, Qing Lin; Chi, Li Hua; Chen, Xin Hai; Zhang, Qing Yang; Zhou, Richard; Deng, Zheng Qiu; Deng, Jin Sheng; Tang, Bin Bing; Lv, Shao He; Liu, Jie

TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution

Files

cgf15211.pdf (6.39 MB)

Date

2024

Authors

Jiang, Qin
Wang, Qing Lin
Chi, Li Hua
Chen, Xin Hai
Zhang, Qing Yang
Zhou, Richard
Deng, Zheng Qiu
Deng, Jin Sheng
Tang, Bin Bing
Lv, Shao He
Liu, Jie

Publisher

The Eurographics Association and John Wiley & Sons Ltd.

Abstract

Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video superresolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusionbased framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video superresolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff

        @article{10.1111:cgf.15211
,
journal = {Computer Graphics Forum},
title = {{TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution
}},
author = {Jiang, Qin and 
Wang, Qing Lin and 
Liu, Jie and 
Chi, Li Hua and 
Chen, Xin Hai and 
Zhang, Qing Yang and 
Zhou, Richard and 
Deng, Zheng Qiu and 
Deng, Jin Sheng and 
Tang, Bin Bing and 
Lv, Shao He
},
year = {2024
},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.
},
ISSN = {1467-8659
},
DOI = {10.1111/cgf.15211
}
}