Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Authors: Ryan Liu, Theodore Sumers, Ishita Dasgupta, Thomas L. Griffiths

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then conduct three experiments testing a suite of stateof-the-art LLMs.
Researcher Affiliation Collaboration 1Department of Computer Science, Princeton University 2Anthropic; work performed while at Princeton University 3Google DeepMind 4Department of Psychology, Princeton University.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No The paper describes generating stimuli based on the signaling bandits paradigm from Sumers et al. (2023) and refers to 'utterance-context pairs from Sumers et al. (2023)', but it does not provide concrete access information (link, DOI, repository) for the specific dataset (input stimuli and collected responses) used in their experiments.
Dataset Splits No The paper evaluates pre-trained large language models (LLMs) and does not describe training, validation, or test dataset splits for its own experiments. The term 'train' is used in the context of LLM training techniques (e.g., RLHF), not dataset partitioning for the reported experiments.
Hardware Specification No The paper mentions using various large language models (LLaMA, Mixtral, GPT) and their respective API settings but does not provide any specific hardware details (e.g., GPU models, CPU types, cloud instance specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'Web PPL' and refers to models via their names (LLaMA, Mixtral, GPT) and API settings, but it does not specify concrete software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch 1.9, or explicit software environment details) used for the experiments.
Experiment Setup Yes We use temperature = 0.1, top p = 0.9 for the LLa Ma models...temperature = 0.7, top p = 1 for the Mixtral models...and temperature = 1, top p = 1 for the GPT models...We used the same grid of λ [0, 1] in steps of .05 and βS, βL [1, 10] in steps of 1.