Distinguishing the Knowable from the Unknowable with Language Models

Authors: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings. Code can be found at: https://github.com/KempnerInstitute/llm_uncertainty
Researcher Affiliation Collaboration Gustaf Ahdritz * 1 Tian Qin * 1 Nikhil Vyas 1 Boaz Barak 1 Benjamin L. Edelman 1 1Harvard University. Correspondence to: Gustaf Ahdritz <gahdritz@g.harvard.edu>, Tian Qin <tqin@g.harvard.edu>. BB is currently also affiliated with Open AI, but this work was done while he was at Harvard.
Pseudocode No No pseudocode or algorithm blocks are explicitly provided or labeled in the paper.
Open Source Code Yes Code can be found at: https://github.com/KempnerInstitute/llm_uncertainty
Open Datasets Yes For LLa MA models, we use the set of Wikipedia articles created (not last edited) between the models training cutoff and June 2023. The training set for the LLa MA models contains a small fraction of older Wikipedia data (Touvron et al., 2023a;b). We also use the designated Pile evaluation and test sets (Gao et al., 2021).
Dataset Splits Yes Of the 71586 articles in the Wikipedia set (approx. 18.5 million tokens), we set aside 2900 each for validation and testing and use the remaining articles as a training set for our prediction heads.
Hardware Specification Yes We use Py Torch (Paszke et al., 2019) and A100 GPUs.
Software Dependencies Yes To train all supervised methods we use Adam (Kingma & Ba, 2015) and a learning rate of 10 5. ... We use Py Torch (Paszke et al., 2019) and A100 GPUs.
Experiment Setup Yes To train all supervised methods we use Adam (Kingma & Ba, 2015) and a learning rate of 10 5. Heads have a hidden dimension of 2048 and either one or zero hidden layers (in the nonlinear and linear cases, respectively). Classification heads are trained with standard cross-entropy loss; regression heads with least squares. All heads are trained with early stopping based on validation loss.