reproducibilityindex.ai

Exploring DCN-like architecture for fast image generation with arbitrary resolution

Authors: Shuai Wang, Zexian Li, Tianhui Song, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. We do not adopt any gradient clip techniques for fair comparison. For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps. We use 8 A100 GPUs as the default training hardware. Flow DCN achieves the state-of-the-art 4.30 s FID on 256 256 Image Net Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only 1 5 images), visual quality, parameters (8% reduction) and FLOPs (20% reduction).
Researcher Affiliation	Collaboration	Shuai Wang Nanjing University Zexian Li Alibaba Group Tianhui Song Nanjing University Xubin Li Alibaba Group Tiezheng Ge Alibaba Group Bo Zheng Alibaba Group Limin Wang Nanjing University, Shanghai AI Lab
Pseudocode	No	No explicit pseudocode or algorithm block found in the paper.
Open Source Code	No	We plan to opensource our code and implementation later.
Open Datasets	Yes	We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The CIFAR10 dataset[35], comprising 50,000 32x32 small-resolution images from 10 distinct class categories, is considered an ideal benchmark to validate the design of our Multi Scale deformable block due to its relatively small scale.
Dataset Splits	No	We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. We do not adopt any gradient clip techniques for fair comparison. For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps.
Hardware Specification	Yes	We use 8 A100 GPUs as the default training hardware. FP16/FP32 results are collected on Nvidia A10 GPU.
Software Dependencies	No	No specific version numbers for general software dependencies like Python, PyTorch, or other libraries are provided.
Experiment Setup	Yes	The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. ... For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps. For sampling, we employ the Euler stochastic solver with 1000 sampling steps to generate images. To generate images, we employ an Euler-Maruyama solver with 250 steps for stochastic sampling. classifier-free guidance with 1.375