Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
U-REPA: Aligning Diffusion U-Nets to ViTs
Authors: Yuchuan Tian, Hanting Chen, Mengyu Zheng, Yuchen Liang, Chao Xu, Yunhe Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments indicate that the resulting U-REPA could achieve excellent generation quality and greatly accelerates the convergence speed. With CFG guidance interval, U-REPA could reach FID < 1.5 in 200 epochs or 1M iterations on Image Net 256 256, and needs only half the total epochs to perform better than REPA under sd-vae-ft-ema. Codes: https://github.com/Yuchuan Tian/U-REPA |
| Researcher Affiliation | Collaboration | 1 State Key Lab of General AI, School of Intelligence Science and Technology, Peking University. 2 Huawei Noah s Ark Lab. 3 The University of Sydney. 4 School of Mathematical Sciences, Peking University. |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. |
| Open Source Code | Yes | Codes: https://github.com/Yuchuan Tian/U-REPA |
| Open Datasets | Yes | All experiments are conducted on the Image Net 2012 benchmark [8] under a controlled environment with a fixed random seed (global seed=0). |
| Dataset Splits | No | All experiments are conducted on the Image Net 2012 benchmark [8] under a controlled environment with a fixed random seed (global seed=0). No specific train/test/validation split percentages or counts are provided for the ImageNet dataset in the main text. |
| Hardware Specification | Yes | 8 NVIDIA A100 GPUs are used for main experiments. |
| Software Dependencies | No | Our implementation completely adheres to the training protocol established in REPA [45]. Following the architectural configuration of latent diffusion models [32], we employ the identical VAE variant (sd-vae-ft-ema) and adopt the Adam W optimizer. No specific version numbers for software libraries or frameworks are provided. |
| Experiment Setup | Yes | Our implementation completely adheres to the training protocol established in REPA [45]. [...] we maintain identical hyperparameter settings across all experiments: a global batch size of 256, fixed learning rate of 1e 4, and disabled weight decay (set to 0). (β1, β2) is set as (0.9, 0.999). [...] We select smaller cfg of 1.65, because we found it is better for our architecture, different from Si T. For all ablation experiments, we train models for 100K iterations, which is sufficient to show the trend of model performance; sampling is conducted with the default setting of the official REPA codebase, i.e. cfg = 1.8 in ODE and guidance interval [0, 0.7]. |