Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sparse Image Synthesis via Joint Latent and RoI Flow

Authors: Ziteng Gao, Jay Zhangjie Wu, Mike Zheng Shou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our sparse flow-based transformers have competitive performance compared with dense grid-based counterparts with significantly reduced lower compute, and reaches a competitive 2.76 FID with just 64 latents on class-conditional Image Net 256 256 generation.
Researcher Affiliation Academia Show Lab, National University of Singapore
Pseudocode No The paper describes the model architecture and training procedures using text and mathematical equations (e.g., Section 3.1, 3.2, and equations 1-13), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Now our codebase is not ready for being publicly available. We will release the code upon acceptance.
Open Datasets Yes To verify the feasibility of synthesizing images with sparse non-grid latents, we conduct experiments on the standard Image Net benchmark [35], mainly on 256 256 images.
Dataset Splits Yes We perform reconstruction FID evaluation [38] on the Image Net validation 50K samples.
Hardware Specification Yes it have already took 7 days on 8 A100s to complete 1.4M steps
Software Dependencies No The paper mentions using Adam optimizer and DINOv2-B/14 for specific components, but it does not specify version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train the SF-VAE on the Image Net training set with a batch size of 128 for 320K iterations (equivalent to 32 epochs) with a learning rate of 10 4. The training configurations are the same as the original Si T models, i.e., the global batch is 256 and the Adam optimizer [43] with constant learning rate 10 4. We set the default β to 2 in the asynchronous interpolating schedule and the weight w L1 to 0.2 for the additional L1 loss on Ro I velocity prediction.