Distinguishing the Knowable from the Unknowable with Language Models
Authors: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings. Code can be found at: https://github.com/KempnerInstitute/llm_uncertainty |
| Researcher Affiliation | Collaboration | Gustaf Ahdritz * 1 Tian Qin * 1 Nikhil Vyas 1 Boaz Barak 1 Benjamin L. Edelman 1 1Harvard University. Correspondence to: Gustaf Ahdritz <gahdritz@g.harvard.edu>, Tian Qin <tqin@g.harvard.edu>. BB is currently also affiliated with Open AI, but this work was done while he was at Harvard. |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly provided or labeled in the paper. |
| Open Source Code | Yes | Code can be found at: https://github.com/KempnerInstitute/llm_uncertainty |
| Open Datasets | Yes | For LLa MA models, we use the set of Wikipedia articles created (not last edited) between the models training cutoff and June 2023. The training set for the LLa MA models contains a small fraction of older Wikipedia data (Touvron et al., 2023a;b). We also use the designated Pile evaluation and test sets (Gao et al., 2021). |
| Dataset Splits | Yes | Of the 71586 articles in the Wikipedia set (approx. 18.5 million tokens), we set aside 2900 each for validation and testing and use the remaining articles as a training set for our prediction heads. |
| Hardware Specification | Yes | We use Py Torch (Paszke et al., 2019) and A100 GPUs. |
| Software Dependencies | Yes | To train all supervised methods we use Adam (Kingma & Ba, 2015) and a learning rate of 10 5. ... We use Py Torch (Paszke et al., 2019) and A100 GPUs. |
| Experiment Setup | Yes | To train all supervised methods we use Adam (Kingma & Ba, 2015) and a learning rate of 10 5. Heads have a hidden dimension of 2048 and either one or zero hidden layers (in the nonlinear and linear cases, respectively). Classification heads are trained with standard cross-entropy loss; regression heads with least squares. All heads are trained with early stopping based on validation loss. |