Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FlashMo: Geometric Interpolants and Frequency-Aware Sparsity for Scalable Efficient Motion Generation

Authors: Zeyu Zhang, Yiran Wang, Danning Li, Dong Gong, Ian Reid, Richard Hartley

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the large-scale Motion Hub V2 dataset and standard benchmarks including Human ML3D and KIT-ML demonstrate that our method significantly outperforms previous approaches in motion quality, efficiency, and scalability.
Researcher Affiliation Academia Zeyu Zhang1 Yiran Wang1 Danning Li2 Dong Gong3 Ian Reid2 Richard Hartley1 1ANU 2MBZUAI 3UNSW
Pseudocode No The paper describes the methodology using textual explanations and mathematical formulations (e.g., equations for interpolants and attention mechanisms) and architectural diagrams (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No While the method is easy to implement and the authors intend to release code and data after acceptance, they are not yet publicly available and require institutional approval for open-sourcing code and model weights.
Open Datasets Yes For pretraining, we utilize the recent large-scale open-source dataset Motion Hub V2 [50]... For downstream evaluation, we conduct experiments on standard text-to-motion (T2M) datasets, including Human ML3D [31] and KIT-ML [61].
Dataset Splits No The paper mentions using 'standard text-to-motion benchmarks, including Human ML3D [31] and KIT-ML [61]' for evaluation. However, it does not explicitly provide specific dataset split information such as percentages, sample counts, or a detailed splitting methodology for these datasets within the provided text.
Hardware Specification Yes All experiments are conducted on an Intel Xeon Platinum 8469C CPU at 2.60GHz, with a single NVIDIA H20 96G GPU and 32GB of RAM.
Software Dependencies No The paper mentions using a 'frozen text encoder from CLIP Vi T-B/32' and the 'Adam W optimizer', but does not provide specific version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) used in the implementation.
Experiment Setup Yes The encoder and decoder of the VAE consist of 4 layers with a compression rate r = 4. Motion Si T has a depth of 8, with a frequency ratio β = 0.5 and an attention threshold γ = 0.95. Both the VAE and Motion Si T use 4 attention heads with a latent dimension of 512. We employ a frozen text encoder from CLIP Vi T-B/32. A constant learning rate of 1 10 4 is used, with a batch size of 256 and the Adam W optimizer. For fair comparison, each model is trained for 6K epochs during the VAE stage and 3K epochs during the diffusion stage. We adopt 1000 diffusion steps during training and 10 sampling steps during inference.