reproducibilityindex.ai

Aligning LLM Agents by Learning Latent Preference from User Edits

Authors: Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce two interactive environments summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost over the base agent. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences.
Researcher Affiliation	Collaboration	Ge Gao Alexey Taymanov Eduardo Salinas Paul Mineiro Dipendra Misra Department of Computer Science, Cornell University Microsoft Research New York
Pseudocode	Yes	Algorithm 1 CIPHER(ϕ, k, δ). A context representation function ϕ : X Rd, the retrieval hyperparameter k, and tolerance hyperparameter δ 0. We initialize history D = .
Open Source Code	Yes	Our code and data are publicly available at https://github.com/gao-g/prelude.
Open Datasets	Yes	We use documents from several existing sources listed in Table 1. These sources represent a diverse category of documents that a writing assistant would typically encounter (see Table 4 in Appendix for examples). In any given round, the user is provided a context that is a document from one of the sources for the given task. Table 4: Link to each source dataset, from which we randomly sample examples as the user-provided context in our tasks. CNN Daily Mail (See et al., 2017) https://huggingface.co/datasets/cnn_dailymail
Dataset Splits	No	Notably, there is no distinction between training and testing in our setting as every natural use of the agent yields an edit feedback for learning.
Hardware Specification	No	The paper states using GPT-4 as the base LLM but does not provide specific hardware details like GPU/CPU models or memory used for experiments.
Software Dependencies	No	We use GPT-4 as our base LLM for CIPHER and all baselines. We use Tiktoken tokenizer. We experiment with MPNET (Song et al., 2020) and BERT (Devlin et al., 2019) as our two context representation functions ϕ. No specific version numbers for software libraries or dependencies are provided.
Experiment Setup	Yes	For both tasks, we run an experiment for T = 200 rounds. We experiment with two different values of the number of retrieved examples k {1, 5}. For E-then-e LPI and Continual LPI we set Te = 5. We provide GPT-4 user prompt template and user edit examples in Appendix B. Prompt templates used by CIPHER are provided in Table 7.