Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
Authors: Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. |
| Researcher Affiliation | Academia | Dan Hendrycks UC Berkeley hendrycks@berkeley.edu Mantas Mazeika UIUC mantas3@illinois.edu Saurav Kadavath* UC Berkeley sauravkadavath@berkeley.edu Dawn Song UC Berkeley dawnsong@berkeley.edu |
| Pseudocode | No | The paper contains mathematical equations but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and our expanded Image Net validation dataset are available at https://github.com/hendrycks/ss-ood. |
| Open Datasets | Yes | Using self-supervised learning techniques on CIFAR-10 and Image Net for out-of-distribution detection... For the outlier dataset, we use 80 Million Tiny Images [Torralba et al., 2008] with CIFAR-10 and CIFAR-100 examples removed. |
| Dataset Splits | No | To select the number of fine-tuning epochs, we use a validation split of the CIFAR-10 training dataset with clean labels and select a value to bring accuracy close to that of Normal Training. This doesn't provide specific percentages or counts. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or cloud instance specifications are mentioned. |
| Software Dependencies | No | The paper mentions optimizers (SGD) and architectures (Wide Residual Networks) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For training, we use SGD with Nesterov momentum of 0.9 and a batch size of 128. We use an initial learning rate of 0.1 and a cosine learning rate schedule Loshchilov and Hutter [2016] and weight decay of 5 10 4. |