Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition

Authors: Haotao Wang, Aston Zhang, Yi Zhu, Shuai Zheng, Mu Li, Alex J Smola, Zhangyang Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms previous state-of-the-art method by 1.29%, 1.45%, 0.69% anomaly detection false positive rate (FPR) and 3.24%, 4.06%, 7.89% in-distribution classification accuracy on CIFAR10-LT, CIFAR100-LT, and Image Net-LT, respectively.
Researcher Affiliation Collaboration 1University of Texas at Austin, Austin, USA 2Amazon Web Services, Santa Clara, USA.
Pseudocode Yes Algorithm 1 Partial and Asymmetric Supervised Contrastive Learning (PASCL)
Open Source Code Yes Code and pre-trained models are available at https: //github.com/amazon-research/long-tailed-ood-detection.
Open Datasets Yes We use three popular long-tailed image classification datasets, CIFAR10-LT, CIFAR100-LT (Cao et al., 2019), and Image Net-LT (Liu et al., 2019), as the in-distribution training data (i.e., Din).
Dataset Splits Yes We use the original CIFAR10 and CIFAR100 test sets and the Image Net validation set as the in-distribution test sets (i.e., Dtest in ).
Hardware Specification No The paper does not explicitly mention the specific hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions "Adam (Kingma & Ba, 2014) optimizer" and "cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016)", but it does not specify version numbers for broader software components or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For experiments on CIFAR10-LT and CIFAR100-LT, we train the main branch (i.e., stage 1 in Algorithm 1) for 200 epochs using Adam (Kingma & Ba, 2014) optimizer with initial learning rate 1 10 3 and batch size 256. We decay the learning rate to 0 using a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016). For auxiliary branch finetuning (i.e., stage 2 in Algorithm 1), we finetune the auxiliary branch for 3 epochs using Adam optimizer with initial learning rate 5 10 4. Other hyper-parameters are the same as in main branch training. For experiments on Image Net-LT, we follow the settings in (Wang et al., 2021b). Specifically, we train the main branch for 100 epochs using SGD optimizer with initial learning rate 0.1 and batch size 256. We decay the learning rate by a factor of 10 at epoch 60 and 80. For auxiliary branch finetuning , we finetune the auxiliary branch for 3 epochs using SGD optimizer with initial learning rate 0.01, which is decayed by a factor of 10 after each finetune epoch. On all datasets, we set τ = 0.1 following (Khosla et al., 2020), λ1 = 0.5 following (Hendrycks et al., 2019), and empirically set λ2 = 0.1 for PASCL.