Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Authors: Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods.
Researcher Affiliation Academia Dan Hendrycks UC Berkeley hendrycks@berkeley.edu Mantas Mazeika UIUC mantas3@illinois.edu Saurav Kadavath* UC Berkeley sauravkadavath@berkeley.edu Dawn Song UC Berkeley dawnsong@berkeley.edu
Pseudocode No The paper contains mathematical equations but no structured pseudocode or algorithm blocks.
Open Source Code Yes Code and our expanded Image Net validation dataset are available at https://github.com/hendrycks/ss-ood.
Open Datasets Yes Using self-supervised learning techniques on CIFAR-10 and Image Net for out-of-distribution detection... For the outlier dataset, we use 80 Million Tiny Images [Torralba et al., 2008] with CIFAR-10 and CIFAR-100 examples removed.
Dataset Splits No To select the number of fine-tuning epochs, we use a validation split of the CIFAR-10 training dataset with clean labels and select a value to bring accuracy close to that of Normal Training. This doesn't provide specific percentages or counts.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or cloud instance specifications are mentioned.
Software Dependencies No The paper mentions optimizers (SGD) and architectures (Wide Residual Networks) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For training, we use SGD with Nesterov momentum of 0.9 and a batch size of 128. We use an initial learning rate of 0.1 and a cosine learning rate schedule Loshchilov and Hutter [2016] and weight decay of 5 10 4.