Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SparseDiT: Token Sparsification for Efficient Diffusion Transformer

Authors: Shuning Chang, Pichao WANG, Jiasheng Tang, Fan Wang, Yi Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate Sparse Di T s effectiveness, achieving a 55% reduction in FLOPs and a 175% improvement in inference speed on Di T-XL with similar FID score on 512 512 Image Net, a 56% reduction in FLOPs across video generation datasets, and a 69% improvement in inference speed on Pix Art-α on text-to-image generation task with a 0.24 FID score decrease. Sparse Di T provides a scalable solution for high-quality diffusion-based generation compatible with sampling optimization techniques.
Researcher Affiliation	Collaboration	Shuning Chang1 2 3 Pichao Wang2 Jiasheng Tang2 3 Fan Wang2 3 Yi Yang1 1Zhejiang University 2Damo Academy, Alibaba Group 3Hupan Lab EMAIL
Pseudocode	No	The paper describes the Sparse Di T architecture and strategies using text and mathematical equations (e.g., Eq 1, 2, 3, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/changsn/Sparse Di T.
Open Datasets	Yes	Our experiments demonstrate Sparse Di T s effectiveness, achieving a 55% reduction in FLOPs and a 175% improvement in inference speed on Di T-XL with similar FID score on 512 512 Image Net [11] images, a 56% reduction in FLOPs across video generation datasets, including Face Forensics [45], Sky Timelapse [60], UCF101 [54], and Taichi-HD [50]. Additionally, on the more challenging text-to-image generation task, we achieve a 69% improvement in inference speed on Pix Art-α with a 0.24 FID score reduction.
Dataset Splits	Yes	We conduct our experiments on Image Net-1k [11] at resolutions of 256 256 and 512 512, following the protocol established in Di T. For Di T-XL, the model consists of 2, 24, and 2 transformers in the bottom, middle, and top segments, respectively. ... Following prior works, we sample 50,000 images to compute the Fréchet Inception Distance (FID) [17] using the ADM Tensor Flow evaluation suite [12], along with the Inception Score (IS) [46], s FID [38], and Precision-Recall metrics [23].
Hardware Specification	Yes	Throughput is evaluated with a batch size of 128 on an Nvidia A100 GPU.
Software Dependencies	No	The paper mentions using the ADM TensorFlow evaluation suite, but it does not specify version numbers for any key software components or libraries used in their implementation.
Experiment Setup	Yes	All training settings and hyperparameters follow their respective papers. Fine-tuning requires approximately 6% of the time needed for training from scratch, e.g., 400K iterations for Di T-XL fine-tuning. ... Classifier-free guidance [19] (CFG) is set to 1.5 for evaluation and 4.0 for visualization. Throughput is evaluated with a batch size of 128 on an Nvidia A100 GPU.