Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

Authors: Xudong Wang, Long Lian, Zhongqi Miao, Ziwei Liu, Stella Yu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new long-tailed classifier called Rout Ing Diverse Experts (RIDE). It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module. RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, Image Net-LT and i Naturalist 2018 benchmarks. It is also a universal framework that is applicable to various backbone networks, long-tailed algorithms, and training mechanisms for consistent performance gains. Our code is available at: https://github.com/frank-xwang/RIDE-Long Tail Recognition.
Researcher Affiliation Academia Xudong Wang1, Long Lian1, Zhongqi Miao1, Ziwei Liu2, Stella X. Yu1 1UC Berkeley / ICSI, 2Nanyang Technological University {xdwang,longlian,zhongqi.miao,stellayu}@berkeley.edu ziwei.liu@ntu.edu.sg
Pseudocode No The paper describes the method using figures and text but does not include a formal pseudocode or algorithm block.
Open Source Code Yes Our code is available at: https://github.com/frank-xwang/RIDE-Long Tail Recognition.
Open Datasets Yes 1. CIFAR100-LT (Cao et al., 2019): CIFAR100 is sampled by class per an exponential decay across classes. We choose imbalance factor 100 and Res Net-32 (He et al., 2016) backbone. 2. Image Net-LT (Liu et al., 2019): Multiple backbone networks are experimented on Image Net LT... 3. i Naturalist 2018 (Van Horn et al., 2018): It is a naturally imbalanced fine-grained dataset with 8,142 categories.
Dataset Splits Yes The original version of CIFAR-100 contains 50,000 images on training set and 10,000 images on validation set with 100 categories.
Hardware Specification Yes All backbone networks are trained with a batch size of 256 on 8 RTX 2080Ti GPUs for 100 epochs
Software Dependencies No The paper mentions optimizers (SGD) and backbone networks (ResNet-32) but does not specify software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes CIFAR100-LT is trained for 200 epochs with standard data augmentations (He et al., 2016) and a batch size of 128 on one RTX 2080Ti GPU. The learning rate is initialized as 0.1 and decayed by 0.01 at epoch 120 and 160 respectively.