Experts Don’t Cheat: Learning What You Don’t Know By Predicting Pairs

Authors: Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that our approach accurately estimates how much models don t know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.
Researcher Affiliation Collaboration 1Google Deep Mind 2University of Toronto, Department of Computer Science, Ontario, Canada. Correspondence to: Daniel D. Johnson <ddjohnson@cs.toronto.edu>.
Pseudocode Yes Algorithm 1 Conservative adjustment of ˆV θ
Open Source Code No The paper does not provide a direct link or explicit statement about the release of its own source code for the methodology described.
Open Datasets Yes We demonstrate our technique on CIFAR-10H (Peterson et al., 2019), a relabeling of the CIFAR-10 test set (Krizhevsky, 2009) by > 50 independent annotators per image.
Dataset Splits Yes We use the next 2,000 images in CIFAR-10H as our validation set.
Hardware Specification No No specific hardware details for the experiments are mentioned beyond general acknowledgements of computing resources.
Software Dependencies No The paper mentions software like TensorFlow, Keras, JAX, and Optax, but does not provide specific version numbers for these ancillary software components.
Experiment Setup Yes We train each method using the Adam W optimizer (Loshchilov & Hutter, 2017) with batch size 512. We divide our training and hyperparameter tuning into the following phases: ... We perform a random search over learning rate and weight decay strength with 250 trials: we choose learning rate logarithmically spaced between 10 5 and 5 10 3, and we either sample weight decay uniformly between 0.05 and 0.5, or logarithmically between 10 6 and 0.05... We use a linear warmup for the learning rate during the first epoch, then use cosine weight decay.