reproducibilityindex.ai

Using Pre-Training Can Improve Model Robustness and Uncertainty

Authors: Dan Hendrycks, Kimin Lee, Mantas Mazeika

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and conﬁdence calibration, we demonstrate large gains from pre-training and complementary effects with task-speciﬁc methods.
Researcher Affiliation	Academia	Dan Hendrycks 1 UC Berkeley Kimin Lee 2 KAIST Mantas Mazeika 3 University of Chicago
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at github.com/hendrycks/pre-training.
Open Datasets	Yes	Datasets. For the following robustness experiments, we evaluate on CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). For pretraining, we use Downsampled Image Net (Chrabaszcz et al., 2017), which is the 1,000-class Image Net dataset (Deng et al., 2009) resized to 32 32 resolution. For the problem of out-of-distribution detection... we use the CIFAR-10, CIFAR-100, and Tiny Image Net datasets (Johnson et al.).
Dataset Splits	No	The paper explicitly states the training and testing splits for CIFAR-10 and CIFAR-100 ('50,000 for training and 10,000 for testing'). While it mentions using '10% of the training data to estimate the optimum temperature' for temperature tuning, it does not specify a general validation set split for hyperparameter tuning or model selection during the main training process.
Hardware Specification	No	The paper discusses deep learning and neural networks which typically run on GPUs, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or computing infrastructure) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers (SGD with Nesterov momentum), learning rate schedules (cosine learning rate schedule), and network architectures (Wide Residual Networks, All Convolutional Network), but it does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions) required for reproducibility.
Experiment Setup	Yes	In all experiments, we use 40-2 Wide Residual Networks, SGD with Nesterov momentum, and a cosine learning rate schedule (Loshchilov & Hutter, 2016). The Normal experiments train for 100 epochs with a learning rate of 0.1 and use dropout at a drop rate of 0.3... The experiments with pre-training train for 10 epochs without dropout, and use a learning rate of 0.001... and 0.01 in the experiments with label noise corrections.