Using Pre-Training Can Improve Model Robustness and Uncertainty

Authors: Dan Hendrycks, Kimin Lee, Mantas Mazeika

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods.
Researcher Affiliation Academia Dan Hendrycks 1 UC Berkeley Kimin Lee 2 KAIST Mantas Mazeika 3 University of Chicago
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at github.com/hendrycks/pre-training.
Open Datasets Yes Datasets. For the following robustness experiments, we evaluate on CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). For pretraining, we use Downsampled Image Net (Chrabaszcz et al., 2017), which is the 1,000-class Image Net dataset (Deng et al., 2009) resized to 32 32 resolution. For the problem of out-of-distribution detection... we use the CIFAR-10, CIFAR-100, and Tiny Image Net datasets (Johnson et al.).
Dataset Splits No The paper explicitly states the training and testing splits for CIFAR-10 and CIFAR-100 ('50,000 for training and 10,000 for testing'). While it mentions using '10% of the training data to estimate the optimum temperature' for temperature tuning, it does not specify a general validation set split for hyperparameter tuning or model selection during the main training process.
Hardware Specification No The paper discusses deep learning and neural networks which typically run on GPUs, but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or computing infrastructure) used for running the experiments.
Software Dependencies No The paper mentions optimizers (SGD with Nesterov momentum), learning rate schedules (cosine learning rate schedule), and network architectures (Wide Residual Networks, All Convolutional Network), but it does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions) required for reproducibility.
Experiment Setup Yes In all experiments, we use 40-2 Wide Residual Networks, SGD with Nesterov momentum, and a cosine learning rate schedule (Loshchilov & Hutter, 2016). The Normal experiments train for 100 epochs with a learning rate of 0.1 and use dropout at a drop rate of 0.3... The experiments with pre-training train for 10 epochs without dropout, and use a learning rate of 0.001... and 0.01 in the experiments with label noise corrections.