Model Alignment as Prospect Theoretic Optimization

Authors: Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4.2. Experiments We subject the models to: (1) winrate experiments following 3.3, where for some test inputs GPT-4-0613 is used to judge the aligned model s generation against the SFT target; (2) generative benchmarks such as MMLU (0-shot) (Hendrycks et al., 2021), GSM8K (8-shot, chain-of-thought) (Cobbe et al., 2021), Human Eval (0-shot) (Chen et al., 2021), and Big Bench-Hard (3-shot chain-of-thought) (Srivastava et al., 2022).
Researcher Affiliation Collaboration 1Stanford University (first author was an intern at Contextual AI) 2Contextual AI.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Our code is available on Github; models are on Huggingface.
Open Datasets Yes The models were trained on a combination of Anthropic-HH (Ganguli et al., 2022), Open Assistant (K opf et al., 2023), and SHP (Ethayarajh et al., 2022).
Dataset Splits No The paper mentions 'test inputs' and 'test data' for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages or counts for its experiments. It refers to standard datasets but not their specific splits for reproduction purposes.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. It mentions model scales like '1B to 30B' but not hardware specifications.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments (e.g., Python, PyTorch, or other relevant frameworks).
Experiment Setup Yes All models are aligned under identical settings on the same data (e.g., same effective batch size, same optimizer, etc.), save for hyperparameters unique to them.