CADet: Fully Self-Supervised Out-Of-Distribution Detection With Contrastive Learning
Authors: Charles Guille-Escuret, Pau Rodriguez, David Vazquez, Ioannis Mitliagkas, Joao Monteiro
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work explores the use of self-supervised contrastive learning to the simultaneous detection of two types of OOD samples: unseen classes and adversarial perturbations. First, we pair self-supervised contrastive learning with the maximum mean discrepancy (MMD) two-sample test. This approach enables us to robustly test whether two independent sets of samples originate from the same distribution, and we demonstrate its effectiveness by discriminating between CIFAR-10 and CIFAR-10.1 with higher confidence than previous work. ...CADet outperforms existing adversarial detection methods in identifying adversarially perturbed samples on Image Net and achieves comparable performance to unseen label detection methods on two challenging benchmarks: Image Net-O and i Naturalist. |
| Researcher Affiliation | Collaboration | Charles Guille-Escuret Service Now Research, Mila, Université de Montréal guillech@mila.quebec Pau Rodriguez Service Now Research pau.rodriguez@servicenow.com David Vazquez Service Now Research david.vazquez@servicenow.com Ioannis Mitliagkas Mila, Université de Montréal, Canada CIFAR AI chair ioannis@mila.quebec Joao Monteiro Service Now Research joao.monteiro@servicenow.com |
| Pseudocode | Yes | The full pseudo-codes of the calibration and testing steps are given ins Appendix A. Algorithm 2 CADet calibration step. Algorithm 3 CADet testing step. |
| Open Source Code | Yes | Our code to compute CADet scores is publicly available as an Open OOD fork at https://github.com/ charles GE/Open OOD-CADet. |
| Open Datasets | Yes | We train on 8 V100 GPUs with an accumulated batch size of 1024. ... We pre-train a ResNet50 with Image Net as in-distribution. ...CIFAR-10 [32], CIFAR-10.1 [52]... Image Net-O [26]; explicitly designed to be challenging for OOD detection with Image Net as in-distribution. i Naturalist using the subset in [28] made of plants with classes that do not intersect Image Net. |
| Dataset Splits | Yes | We use |{X(2) val}| = 2000 in-distribution samples, |{X(1) val}| = 300 separate samples to compute cross-similarities, and 50 transformations per sample. ...for 100 different samplings of S(1) P , S(2) P , SQ, using every time nperm = 500... We run MMD-CC and MMD two-sample tests for 100 different samplings of S(1) P , S(2) P , SQ, using every time nperm = 500... For these experiments, we sample S(1) P and S(2) P 5000 times across all of Image Net s validation set and compare their MMD and MMD-CC estimators to the one obtained from SP and SQ. |
| Hardware Specification | Yes | We train on 8 V100 GPUs with an accumulated batch size of 1024. ...it only required less than 10 minutes on a single A100 GPU |
| Software Dependencies | No | The paper mentions using the LARS optimizer, but does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow) or other dependencies. |
| Experiment Setup | Yes | Hyperparameters: We follow as closely as possible the setting from Sim CLRv2 with a few modifications to adapt to hardware limitations. In particular, we use the LARS optimizer [72] with learning rate 1.2, momentum 0.9, and weight decay 10 4. Iteration-wise, we scale up the learning rate for the first 40 epochs linearly, then use an iteration-wise cosine decaying schedule until epoch 800, with temperature τ = 0.1. We train on 8 V100 GPUs with an accumulated batch size of 1024. ... We use synchronized Batch Norm and fp32 precision and do not use a memory buffer. We use the same set of transformations, i.e., Gaussian blur and horizontal flip with probability 0.5, color jittering with probability 0.8, random crop with scale uniformly sampled in [0.08, 1], and grayscale with probability 0.2. ...We use |{X(2) val}| = 2000 in-distribution samples, |{X(1) val}| = 300 separate samples to compute cross-similarities, and 50 transformations per sample. We fix the random crop scale to 0.75. ...PGD: norm L , δ = 0.02, step size 0.002, 50 iterations; CW: norm L2, δ = 0.10, learning rate of 0.03, and 50 iterations; FGSM: norm L , δ = 0.05. |