DiffuLT: Diffusion for Long-tail Recognition Without External Knowledge

Authors: Jie Shao, Ke Zhu, Hanxiao Zhang, Jianxin Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach, termed Diffusion model for Long-Tail recognition (Diffu LT), represents a pioneer application of generative models in long-tail recognition. Diffu LT achieves state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and Image Net-LT, surpassing leading competitors by significant margins. Comprehensive ablations enhance the interpretability of our pipeline.
Researcher Affiliation Academia Jie Shao Ke Zhu Hanxiao Zhang Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China {shaoj, zhuk, zhanghx}@lamda.nju.edu.cn, wujx2001@nju.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes procedures in text and provides mathematical equations.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code related to this paper will be released as open-source after it is accepted.
Open Datasets Yes Our research evaluate three long-tailed datasets: CIFAR10-LT (Cao et al. [2019a]), CIFAR100-LT (Cao et al. [2019a]), and Image Net-LT (Liu et al. [2019a]).
Dataset Splits No The paper describes the composition of the training data and mentions the test set, but does not explicitly state a validation set split or how it's derived if used. For instance, it mentions 'The original CIFAR100 and CIFAR10 datasets each consist of a training set with 50,000 images...' and 'The test set of Image Net-LT mirrors that of Image Net, containing 100,000 images.', but no explicit validation split details are given.
Hardware Specification Yes All training tasks are conducted on 8 NVIDIA GeForce RTX 3090 GPUs, with further discussed in appendix B
Software Dependencies No The paper mentions software components and frameworks like 'CBDM Qin et al. [2023]', 'Adaptive Augmentation Karras et al. [2020]', and adapting a 'codebase from Dhariwal and Nichol [2021]', but does not provide specific version numbers for any of these or other key software dependencies.
Experiment Setup Yes We set α = 0.1, and ω = 0.3. The generation thresholds Nt for CIFAR10-LT and CIFAR100-LT were fixed at 5000 and 500, respectively. We employ Res Net-32 as the classifier backbone. For Image Net-LT experiments, we set a generation threshold of Nt = 300. The classifiers were based on Res Net-10 and Res Net-50 architectures with ω = 0.5. We set the training duration to 500,000 steps, with hyperparameters τ and γ fixed at 1 and 0.25, as per the cited study. The batch size is maintained at 128, the diffusion process runs for 1,000 time steps, and the learning rate is set at 0.0002 using an Adam optimizer. For classifier training, we follow the code and protocols from Zhou et al. [2020a], which prescribe a 200-epoch training regimen. The classifier training also employs a batch size of 128, utilizing an SGD optimizer with a learning rate of 0.1.