Experts Don’t Cheat: Learning What You Don’t Know By Predicting Pairs
Authors: Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that our approach accurately estimates how much models don t know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2University of Toronto, Department of Computer Science, Ontario, Canada. Correspondence to: Daniel D. Johnson <ddjohnson@cs.toronto.edu>. |
| Pseudocode | Yes | Algorithm 1 Conservative adjustment of ˆV θ |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the release of its own source code for the methodology described. |
| Open Datasets | Yes | We demonstrate our technique on CIFAR-10H (Peterson et al., 2019), a relabeling of the CIFAR-10 test set (Krizhevsky, 2009) by > 50 independent annotators per image. |
| Dataset Splits | Yes | We use the next 2,000 images in CIFAR-10H as our validation set. |
| Hardware Specification | No | No specific hardware details for the experiments are mentioned beyond general acknowledgements of computing resources. |
| Software Dependencies | No | The paper mentions software like TensorFlow, Keras, JAX, and Optax, but does not provide specific version numbers for these ancillary software components. |
| Experiment Setup | Yes | We train each method using the Adam W optimizer (Loshchilov & Hutter, 2017) with batch size 512. We divide our training and hyperparameter tuning into the following phases: ... We perform a random search over learning rate and weight decay strength with 250 trials: we choose learning rate logarithmically spaced between 10 5 and 5 10 3, and we either sample weight decay uniformly between 0.05 and 0.5, or logarithmically between 10 6 and 0.05... We use a linear warmup for the learning rate during the first epoch, then use cosine weight decay. |