Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition

Authors: Zhiyong Yang, Qianqian Xu, Zitai Wang, Sicong Li, Boyu Han, Shilong Bao, Xiaochun Cao, Qingming Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused on a particular neighbor. Traditional methods predominantly use a Mixture-of-Expert (Mo E) approach, targeting a few fixed test label distributions that exhibit substantial global variations. However, the local variations are left unconsidered. To address this issue, we propose a new Mo E strategy, Dir Mix E, which assigns experts to different Dirichlet meta-distributions of the label distribution, each targeting a specific aspect of local variations. Additionally, the diversity among these Dirichlet meta-distributions inherently captures global variations. This dual-level approach also leads to a more stable objective function, allowing us to sample different test distributions better to quantify the mean and variance of performance outcomes. Theoretically, we show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization. Comprehensive experiments across multiple benchmarks confirm the effectiveness of Dir Mix E.
Researcher Affiliation Academia 1 School of Computer Science and Tech., University of Chinese Academy of Sciences 2Key Lab. of Intelligent Information Processing, Institute of Computing Tech., CAS 3Institute of Information Engineering, CAS 4School of Cyber Security, University of Chinese Academy of Sciences 5School of Cyber Science and Tech., Shenzhen Campus of Sun Yat-sen University 6BDKM, University of Chinese Academy of Sciences. Correspondence to: Qianqian Xu <xuqianqian@ict.ac.cn>, Qingming Huang <qmhuang@ucas.ac.cn>.
Pseudocode Yes Algorithm 1 Training Algorithm
Open Source Code Yes The code is available at https: //github.com/scongl/Dir Mix E.
Open Datasets Yes We conduct experiments on three popular benchmark datasets for imbalanced learning: (a) CIFAR-10-LT and CIFAR-100-LT datasets. The original CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009)... (b) Image Net-LT dataset. We adopt the Image Net-LT dataset proposed by (Liu et al., 2019)...
Dataset Splits Yes The original CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009) have 50,000 images for training and 10,000 images for validation with 10 and 100 categories, respectively.
Hardware Specification No The paper states, “we re-implement the above methods using their publicly available code and conduct experiments on the same device.” However, it does not provide specific details about the hardware used, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions using “Res Ne Xt-50” and “Res Net-32” as backbones, and training with “Stochastic Gradient Descent (SGD).” However, it does not list specific software dependencies with version numbers, such as PyTorch, TensorFlow, CUDA, or other libraries.
Experiment Setup Yes In CIFAR-LT experiments, we train the model for 200 epochs using Stochastic Gradient Descent (SGD). The initial learning rate is set at 0.1, with 0.9 momentum rate and 128 batch size. Moreover, a step learning rate schedule is adopted, which reduces the learning rate by a factor of 10 at the 160-th and 180-th epoch, respectively. Regarding the Image Net-LT dataset, the model is trained 180 epochs using SGD. Here, the initial learning rate is 0.025 with 0.9 momentum and 64 batch size. Then, the learning rate is adjusted through a cosine annealing schedule, which gradually declines from 0.025 to 0 over 180 epochs.