Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sparse Image Synthesis via Joint Latent and RoI Flow
Authors: Ziteng Gao, Jay Zhangjie Wu, Mike Zheng Shou
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our sparse flow-based transformers have competitive performance compared with dense grid-based counterparts with significantly reduced lower compute, and reaches a competitive 2.76 FID with just 64 latents on class-conditional Image Net 256 256 generation. |
| Researcher Affiliation | Academia | Show Lab, National University of Singapore |
| Pseudocode | No | The paper describes the model architecture and training procedures using text and mathematical equations (e.g., Section 3.1, 3.2, and equations 1-13), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Now our codebase is not ready for being publicly available. We will release the code upon acceptance. |
| Open Datasets | Yes | To verify the feasibility of synthesizing images with sparse non-grid latents, we conduct experiments on the standard Image Net benchmark [35], mainly on 256 256 images. |
| Dataset Splits | Yes | We perform reconstruction FID evaluation [38] on the Image Net validation 50K samples. |
| Hardware Specification | Yes | it have already took 7 days on 8 A100s to complete 1.4M steps |
| Software Dependencies | No | The paper mentions using Adam optimizer and DINOv2-B/14 for specific components, but it does not specify version numbers for any software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train the SF-VAE on the Image Net training set with a batch size of 128 for 320K iterations (equivalent to 32 epochs) with a learning rate of 10 4. The training configurations are the same as the original Si T models, i.e., the global batch is 256 and the Adam optimizer [43] with constant learning rate 10 4. We set the default β to 2 in the asynchronous interpolating schedule and the weight w L1 to 0.2 for the additional L1 loss on Ro I velocity prediction. |