Exploring DCN-like architecture for fast image generation with arbitrary resolution
Authors: Shuai Wang, Zexian Li, Tianhui Song, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. We do not adopt any gradient clip techniques for fair comparison. For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps. We use 8 A100 GPUs as the default training hardware. Flow DCN achieves the state-of-the-art 4.30 s FID on 256 256 Image Net Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only 1 5 images), visual quality, parameters (8% reduction) and FLOPs (20% reduction). |
| Researcher Affiliation | Collaboration | Shuai Wang Nanjing University Zexian Li Alibaba Group Tianhui Song Nanjing University Xubin Li Alibaba Group Tiezheng Ge Alibaba Group Bo Zheng Alibaba Group Limin Wang Nanjing University, Shanghai AI Lab |
| Pseudocode | No | No explicit pseudocode or algorithm block found in the paper. |
| Open Source Code | No | We plan to opensource our code and implementation later. |
| Open Datasets | Yes | We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The CIFAR10 dataset[35], comprising 50,000 32x32 small-resolution images from 10 distinct class categories, is considered an ideal benchmark to validate the design of our Multi Scale deformable block due to its relatively small scale. |
| Dataset Splits | No | We conduct experiments on 32x32 CIFAR10 and 256x256 Image Net datasets. The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. We do not adopt any gradient clip techniques for fair comparison. For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps. |
| Hardware Specification | Yes | We use 8 A100 GPUs as the default training hardware. FP16/FP32 results are collected on Nvidia A10 GPU. |
| Software Dependencies | No | No specific version numbers for general software dependencies like Python, PyTorch, or other libraries are provided. |
| Experiment Setup | Yes | The training batch size is set to 256. Similar to Si T [23] and Di T [12], we use Adam optimizer [31] with a constant learning rate 0.0001 during the whole training. ... For 32x32 CIFAR10 dataset, we train our model for 25000 steps. As for 256x256 Image Net dataset, we train for 1.5M steps. For sampling, we employ the Euler stochastic solver with 1000 sampling steps to generate images. To generate images, we employ an Euler-Maruyama solver with 250 steps for stochastic sampling. classifier-free guidance with 1.375 |