Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Authors: Shen Zhang, Siyuan Liang, Yaning Tan, Zhaowei Chen, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on both conditional and text-to-image generation tasks to validate the effectiveness of LEDi T. Notably, LEDi T supports up to 4 inference resolution scaling while maintaining structural fidelity and fine-grained details, outperforming state-of-the-art extrapolation methods. Moreover, LEDi T can generate images with arbitrary aspect ratios (e.g., 512 384 or 512 256) without any multi-aspect-ratio training techniques.
Researcher Affiliation Collaboration 1JIIOV Technology 2Nanjing University 3Nankai University EMAIL EMAIL EMAIL
Pseudocode No The paper describes the architecture and method using mathematical equations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code Yes Project page: https://shenzhang2145.github.io/ledit/ (Also, in Neur IPS Paper Checklist: 'We release the source code in the supplementary material.')
Open Datasets Yes The experiments are trained on Image Net [6] with 256 256 and 512 512 resolutions, and on COCO [26] with 256 256 resolution.
Dataset Splits No The paper mentions training on Image Net and COCO datasets at specific resolutions and uses terms like 'generate 50K samples' or 'generate 40,504 images' for evaluation. While these imply an evaluation set, it does not explicitly state the training/validation/test splits (e.g., percentages, sample counts for training/validation) or refer to standard predefined splits for these datasets.
Hardware Specification Yes We use 8 NVIDIA V100 GPUs as default training hardware.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes For conditional generation on Image Net [6], we use a patch size p = 2 and follow Di T-XL [31] to set the same layers, hidden size, and attention heads for the XLarge model, denoted by LEDi T-XL/2. For text-to-image generation on COCO [26], We use MMDi T [11] and set the hidden dimension as 768 and the model depth as 24, following the design in REPA [48], denoted as LEMMDi T. On Image Net, We (i) train the randomly initialized LEDi T for 400K steps or (ii) fine-tune LEDi T for 100K steps. We set the batch size as 256. On COCO, We follow REPA [48] and train LEMMDi T for 200K steps with a batch size of 192. We generate 50K samples using 250 DDPM sampling steps with a classifier-free guidance (CFG) scale of 1.5. On COCO, we generate 40,504 images (one per caption) using 50 ODE sampling steps with CFG=2.0.