Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FiT: Flexible Vision Transformer for Diffusion Model
Authors: Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate the exceptional performance of Fi T across a broad range of resolutions. Repository available at https://github.com/whlzy/Fi T. |
| Researcher Affiliation | Collaboration | 1Shanghai Artificial Intelligence Laboratory 2Shanghai Jiao Tong University 3Tsinghua University 4Sydney University 5The University of Hong Kong. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Repository available at https://github.com/whlzy/Fi T. |
| Open Datasets | Yes | We train class-conditional latent Fi T models under predetermined maximum resolution limitation, HW <= 256^2 (equivalent to token length L <= 256), on the Image Net (Deng et al., 2009) dataset. |
| Dataset Splits | No | The paper mentions general training settings (learning rate, batch size, EMA, diffusion hyper-parameters) but does not provide specific dataset split information (percentages, counts) for training, validation, or test sets, nor does it cite predefined splits with specification. |
| Hardware Specification | No | The paper mentions 'GPU hardware' as a constraint but does not provide specific details on the GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types used for the experiments. |
| Software Dependencies | No | The paper mentions software like AdamW, TensorFlow, and Stable Diffusion, but it does not specify exact version numbers for these software components, which is required for reproducible dependency descriptions. |
| Experiment Setup | Yes | We use the same training setting as Di T: a constant learning rate of 1e-4 using AdamW (Loshchilov & Hutter, 2017), no weight decay, and a batch size of 256. Following common practice in the generative modeling literature, we adopt an exponential moving average (EMA) of model weights over training with a decay of 0.9999. All results are reported using the EMA model. We retain the same diffusion hyper-parameters as Di T. |