LEAD: Latent Realignment for Human Motion Diffusion

dc.contributor.authorAndreou, Nefelien_US
dc.contributor.authorWang, Xien_US
dc.contributor.authorFernández Abrevaya, Victoriaen_US
dc.contributor.authorCani, Marie-Pauleen_US
dc.contributor.authorChrysanthou, Yiorgosen_US
dc.contributor.authorKalogeiton, Vickyen_US
dc.contributor.editorWimmer, Michaelen_US
dc.contributor.editorAlliez, Pierreen_US
dc.contributor.editorWestermann, RĂĽdigeren_US
dc.date.accessioned2025-11-07T08:32:57Z
dc.date.available2025-11-07T08:32:57Z
dc.date.issued2025
dc.description.abstractOur goal is to generate realistic human motion from natural language. Modern methods often face a trade-off between model expressiveness and text-to-motion (T2M) alignment. Some align text and motion latent spaces but sacrifice expressiveness; others rely on diffusion models producing impressive motions but lacking semantic meaning in their latent space. This may compromise realism, diversity and applicability. Here, we address this by combining latent diffusion with a realignment mechanism, producing a novel, semantically structured space that encodes the semantics of language. Leveraging this capability, we introduce the task of textual motion inversion to capture novel motion concepts from a few examples. For motion synthesis, we evaluate LEAD on HumanML3D and KIT-ML and show comparable performance to the state-of-the-art in terms of realism, diversity and textmotion consistency. Our qualitative analysis and user study reveal that our synthesised motions are sharper, more human-like and comply better with the text compared to modern methods. For motion textual inversion (MTI), our method demonstrates improvements in capturing out-of-distribution characteristics in comparison to traditional VAEs.en_US
dc.description.number6
dc.description.sectionheadersOriginal Article
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume44
dc.identifier.doi10.1111/cgf.70093
dc.identifier.issn1467-8659
dc.identifier.pages17 pages
dc.identifier.urihttps://doi.org/10.1111/cgf.70093
dc.identifier.urihttps://diglib.eg.org/handle/10.1111/cgf70093
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.rightsAttribution-NonCommercial-NoDerivs 4.0 International License
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectanimation
dc.subjectmotion capture
dc.subjectmotion control
dc.subjectComputing methodologies→Motion capture
dc.subjectActivity recognition and understanding
dc.subjectLearning paradigms
dc.titleLEAD: Latent Realignment for Human Motion Diffusionen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
05_cgf70093.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format
Collections