Breaking Long-Tailed Learning Bottlenecks: A Controllable Paradigm with Hypernetwork-Generated Diverse Experts

Authors: Zhe Zhao, HaiBin Wen, Zikang Wang, Pengkun Wang, Fanfu Wang, Song Lai, Qingfu Zhang, Yang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method not only achieves higher performance ceilings but also effectively overcomes distribution shift while allowing controllable adjustments according to user preferences.
Researcher Affiliation Collaboration Zhe Zhao1,3, Haibin Wen5, Zikang Wang1, Pengkun Wang1,2 , Fanfu Wang6, Song Lai3, Qingfu Zhang3, Yang Wang1,2 1University of Science and Technology of China (USTC), Hefei, China 2Suzhou Institute for Advanced Research, USTC, Suzhou, China 3City University of Hong Kong, Hong Kong, China 4Harbin Institute of Technology, Harbin, China 5Morong AI, Suzhou, China 6Lanzhou University, Lanzhou, China
Pseudocode Yes Here are pseudo codes explaining the core aspects of the method: Algorithm 1 Diverse Expert Learning with Hypernetworks
Open Source Code Yes The code can be found here: https://github.com/Data Lab-atom/PRL.
Open Datasets Yes Datasets. We evaluate our method on four benchmark datasets: Image Net-LT [20], CIFAR100-LT [4], Places-LT [20], and i Naturalist 2018 [29]. These datasets have varying imbalance ratios, ranging from 10 to 256. CIFAR100-LT has three versions with different imbalance ratios. Detailed statistics are in Appendix D.
Dataset Splits Yes Datasets. We evaluate our method on four benchmark datasets: Image Net-LT [20], CIFAR100-LT [4], Places-LT [20], and i Naturalist 2018 [29]. ... Results on standard long-tailed recognition. Table 1 demonstrates the effectiveness of our proposed method, PRL, on four benchmark datasets under the standard long-tailed recognition setting, where the test class distribution is uniform.
Hardware Specification No The paper mentions models like Res Net-32, Res Ne Xt-50, and Res Net-50 in its complexity analysis (Table 9), but it does not specify the hardware (e.g., specific GPU or CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper specifies hyperparameters and optimizers (e.g., 'SGD with momentum 0.9'), but it does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup Yes We use the same setup for all methods, including Res Ne Xt-50 for Image Net-LT, Res Net-32 for CIFAR100-LT, Res Net-152 for Places-LT, and Res Net-50 for i Naturalist 2018 as backbones. We employ hypernets (MLPs) to output trainable parameters of experts and adopt the cosine classifier for prediction. Unless specified, we use α = 1.2 for the Dirichlet distribution, µ = 0.3 for stochastic annealing, SGD with momentum 0.9, train for 200 epochs, and set the initial learning rate to 0.1 with linear decay. During test-time training, we train aggregation weights for 5 epochs with a batch size of 128, using the same optimizer and learning rate as in training.