Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Authors: Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng Han, Jiayi Guo, Zhiyuan Liu, Yuan Yao, Gao Huang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Image Net-2562 & 5122 and MSCOCO validate the effectiveness of ENAT. |
| Researcher Affiliation | Collaboration | Zanlin Ni1 Yulin Wang1 Renping Zhou1 Yizeng Han1 Jiayi Guo1 Zhiyuan Liu1 Yuan Yao2 Gao Huang1 1Tsinghua University 2National University of Singapore |
| Pseudocode | No | The paper describes algorithms and processes textually and with diagrams (e.g., Figure 4), but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code and pre-trained models will be released at https://github.com/Leap Lab THU/ENAT. |
| Open Datasets | Yes | Experiments on Image Net-2562 & 5122 and MSCOCO validate the effectiveness of ENAT. |
| Dataset Splits | Yes | Our evaluation on FID follows the same evaluation protocol as [10, 3, 49]. We adopt the pre-computed dataset statistics from [3] and generate 50k samples for Image Net (30k for MS-COCO) to compute the statistics for the generated samples... |
| Hardware Specification | Yes | All our experiments are conducted with 8 A100 80G GPUs. |
| Software Dependencies | No | The paper mentions utilizing a pretrained VQGAN [13] but does not specify software versions or library dependencies used for implementation or experiments. |
| Experiment Setup | Yes | For Image Net 256 256, we use a batch size of 2048 and a learning rate of 4e-4. For Image Net 512 512, to manage the increased sequence length, we reduce the batch size to 512 and linearly scale down the learning rate to 1e-4. For MS-COCO, we train for 150k steps instead of the 1000k steps used in [3]. For our ablation studies in Sec. 5.2 and explorative experiments in Sec. 4, we train the models for 300k steps instead of the 500k steps used in [3], while keeping the other settings the same as above. |