LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

Authors: Pengkun Wang, Zhe Zhao, HaiBin Wen, Fanfu Wang, Binwu Wang, Qingfu Zhang, Yang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments on multiple mainstream long-tailed learning benchmarks. The results show LLM-Auto DA outperforms state-of-the-art data augmentation methods and other re-balancing methods significantly.
Researcher Affiliation Collaboration 1University of Science and Technology of China (USTC), Hefei, China 2Suzhou Institute for Advanced Research, USTC, Suzhou, China 3City University of Hong Kong, Hong Kong, China 5Morong AI, Suzhou, China 6Lanzhou University, Lanzhou, China
Pseudocode Yes Appendix C Pseudocode
Open Source Code Yes The code is available in https://github.com/Data Lab-atom/LLM-LT-AUG.
Open Datasets Yes Like most long-tailed learning methods, we conducted experiments on several mainstream long-tailed learning datasets, including CIFAR-100-LT [5], Image Net-LT [26], and i Naturalist 2018 [38].
Dataset Splits Yes To evaluate the performance of candidate data augmentation strategies, we insert them into the model training process, conduct a small amount of additional training on the training set, and then test the accuracy on the validation set, using the accuracy as the fitness score for the algorithm.
Hardware Specification Yes We trained and evaluated the models on 2 NVIDIA Tesla A100 GPUs and reported the experimental results.
Software Dependencies No All our models are implemented based on Py Torch [27]. We utilized the powerful gpt-3.5-turbo for strategy generation and employed AEL [23] for strategy optimization.
Experiment Setup Yes In the experimental process, we first trained the models for 50 epochs without using augmentation strategies, then continued training with augmentation strategies for an additional 20 epochs, employing a novel evaluation mechanism.