reproducibilityindex.ai

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

Authors: Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, Yang Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations demonstrate that input clarification ensembling provides accurate and reliable uncertainty quantification on several language processing tasks.
Researcher Affiliation	Collaboration	1UC Santa Barbara 2MIT-IBM Watson AI Lab, IBM Research 3MIT CSAIL. Correspondence to: Bairu Hou <bairu@ucsb.edu>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data are available at https://github.com/ UCSB-NLP-Chang/llm_uncertainty.
Open Datasets	Yes	We evaluate the total uncertainty on the Natural Question (NQ) dataset (Kwiatkowski et al., 2019) and GSM8K (Cobbe et al., 2021). For ambiguity detection of the question, we select the Ambig QA dataset (Min et al., 2020)
Dataset Splits	Yes	We use the full Ambig Inst dataset and randomly sample 200 examples from the validation set of Ambig QA for evaluation. We fine-tuning the Llama-3-8B-Instruction on the full training set of Ambig QA dataset... We evaluate the model on the validation set and take the model that achieves lowest validation loss (epoch = 2) for testing.
Hardware Specification	Yes	We fine-tuning the Llama-3-8B-Instruction on the full training set of Ambig QA dataset on 4 NVIDIA H100 80GB HBM3 GPU.
Software Dependencies	No	The paper mentions using "Py Torch Lightning, Deep Speed Stage 1, and flash-attention 2" but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We train the model with batch size 16, learning rate 2e-5, and cosine learning rate scheduler for 5 epochs. The loss is only computed on the output tokens.