TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution

dc.contributor.authorJiang, Qinen_US
dc.contributor.authorWang, Qing Linen_US
dc.contributor.authorChi, Li Huaen_US
dc.contributor.authorChen, Xin Haien_US
dc.contributor.authorZhang, Qing Yangen_US
dc.contributor.authorZhou, Richarden_US
dc.contributor.authorDeng, Zheng Qiuen_US
dc.contributor.authorDeng, Jin Shengen_US
dc.contributor.authorTang, Bin Bingen_US
dc.contributor.authorLv, Shao Heen_US
dc.contributor.authorLiu, Jieen_US
dc.contributor.editorChen, Renjieen_US
dc.contributor.editorRitschel, Tobiasen_US
dc.contributor.editorWhiting, Emilyen_US
dc.date.accessioned2024-10-13T18:07:28Z
dc.date.available2024-10-13T18:07:28Z
dc.date.issued2024
dc.description.abstractLatent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video superresolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusionbased framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video superresolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiffen_US
dc.description.number7
dc.description.sectionheadersImage and Video Enhancement I
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume43
dc.identifier.doi10.1111/cgf.15211
dc.identifier.issn1467-8659
dc.identifier.pages12 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.15211
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf15211
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectCCS Concepts: Computing methodologies → Computer vision tasks
dc.subjectComputing methodologies → Computer vision tasks
dc.titleTempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolutionen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
cgf15211.pdf
Size:
6.39 MB
Format:
Adobe Portable Document Format
Collections