reproducibilityindex.ai

Experts Don’t Cheat: Learning What You Don’t Know By Predicting Pairs

Authors: Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate empirically that our approach accurately estimates how much models don t know across ambiguous image classiﬁcation, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.
Researcher Affiliation	Collaboration	1Google Deep Mind 2University of Toronto, Department of Computer Science, Ontario, Canada. Correspondence to: Daniel D. Johnson <ddjohnson@cs.toronto.edu>.
Pseudocode	Yes	Algorithm 1 Conservative adjustment of ˆV θ
Open Source Code	No	The paper does not provide a direct link or explicit statement about the release of its own source code for the methodology described.
Open Datasets	Yes	We demonstrate our technique on CIFAR-10H (Peterson et al., 2019), a relabeling of the CIFAR-10 test set (Krizhevsky, 2009) by > 50 independent annotators per image.
Dataset Splits	Yes	We use the next 2,000 images in CIFAR-10H as our validation set.
Hardware Specification	No	No specific hardware details for the experiments are mentioned beyond general acknowledgements of computing resources.
Software Dependencies	No	The paper mentions software like TensorFlow, Keras, JAX, and Optax, but does not provide specific version numbers for these ancillary software components.
Experiment Setup	Yes	We train each method using the Adam W optimizer (Loshchilov & Hutter, 2017) with batch size 512. We divide our training and hyperparameter tuning into the following phases: ... We perform a random search over learning rate and weight decay strength with 250 trials: we choose learning rate logarithmically spaced between 10 5 and 5 10 3, and we either sample weight decay uniformly between 0.05 and 0.5, or logarithmically between 10 6 and 0.05... We use a linear warmup for the learning rate during the ﬁrst epoch, then use cosine weight decay.