reproducibilityindex.ai

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Authors: Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods.
Researcher Affiliation	Academia	Dan Hendrycks UC Berkeley hendrycks@berkeley.edu Mantas Mazeika UIUC mantas3@illinois.edu Saurav Kadavath* UC Berkeley sauravkadavath@berkeley.edu Dawn Song UC Berkeley dawnsong@berkeley.edu
Pseudocode	No	The paper contains mathematical equations but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and our expanded Image Net validation dataset are available at https://github.com/hendrycks/ss-ood.
Open Datasets	Yes	Using self-supervised learning techniques on CIFAR-10 and Image Net for out-of-distribution detection... For the outlier dataset, we use 80 Million Tiny Images [Torralba et al., 2008] with CIFAR-10 and CIFAR-100 examples removed.
Dataset Splits	No	To select the number of ﬁne-tuning epochs, we use a validation split of the CIFAR-10 training dataset with clean labels and select a value to bring accuracy close to that of Normal Training. This doesn't provide specific percentages or counts.
Hardware Specification	No	No specific hardware details such as GPU models, CPU types, or cloud instance specifications are mentioned.
Software Dependencies	No	The paper mentions optimizers (SGD) and architectures (Wide Residual Networks) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For training, we use SGD with Nesterov momentum of 0.9 and a batch size of 128. We use an initial learning rate of 0.1 and a cosine learning rate schedule Loshchilov and Hutter [2016] and weight decay of 5 10 4.