TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution

Jiang, Qin; Wang, Qing Lin; Chi, Li Hua; Chen, Xin Hai; Zhang, Qing Yang; Zhou, Richard; Deng, Zheng Qiu; Deng, Jin Sheng; Tang, Bin Bing; Lv, Shao He; Liu, Jie

TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution

dc.contributor.author	Jiang, Qin	en_US
dc.contributor.author	Wang, Qing Lin	en_US
dc.contributor.author	Chi, Li Hua	en_US
dc.contributor.author	Chen, Xin Hai	en_US
dc.contributor.author	Zhang, Qing Yang	en_US
dc.contributor.author	Zhou, Richard	en_US
dc.contributor.author	Deng, Zheng Qiu	en_US
dc.contributor.author	Deng, Jin Sheng	en_US
dc.contributor.author	Tang, Bin Bing	en_US
dc.contributor.author	Lv, Shao He	en_US
dc.contributor.author	Liu, Jie	en_US
dc.contributor.editor	Chen, Renjie	en_US
dc.contributor.editor	Ritschel, Tobias	en_US
dc.contributor.editor	Whiting, Emily	en_US
dc.date.accessioned	2024-10-13T18:07:28Z
dc.date.available	2024-10-13T18:07:28Z
dc.date.issued	2024
dc.description.abstract	Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video superresolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusionbased framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video superresolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff	en_US
dc.description.number	7
dc.description.sectionheaders	Image and Video Enhancement I
dc.description.seriesinformation	Computer Graphics Forum
dc.description.volume	43
dc.identifier.doi	10.1111/cgf.15211
dc.identifier.issn	1467-8659
dc.identifier.pages	12 pages
dc.identifier.uri	https://doi.org/10.1111/cgf.15211
dc.identifier.uri	https://diglib.eg.org/handle/10.1111/cgf15211
dc.publisher	The Eurographics Association and John Wiley & Sons Ltd.	en_US
dc.rights	Attribution 4.0 International License
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	CCS Concepts: Computing methodologies → Computer vision tasks
dc.subject	Computing methodologies → Computer vision tasks
dc.title	TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: cgf15211.pdf
Size:: 6.39 MB
Format:: Adobe Portable Document Format

Download

Collections

43-Issue 7