reproducibilityindex.ai

Continual Learning for Instruction Following from Realtime Feedback

Authors: Alane Suhr, Yoav Artzi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. ... We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time.
Researcher Affiliation	Academia	Alane Suhr University of California, Berkeley suhr@berkeley.edu Yoav Artzi Cornell University yoav@cs.cornell.edu
Pseudocode	Yes	Algorithm 1 Continual learning for instruction following from realtime user feedback.
Open Source Code	Yes	Our code and data is available here: https://github.com/lil-lab/clif_cb.
Open Datasets	Yes	Our code and data is available here: https://github.com/lil-lab/clif_cb. ... the demonstration training dataset D0 includes 8,790 instructions from 456 randomly-sampled human-human interactions from Suhr et al. [41].
Dataset Splits	Yes	We use a held-out subset of the original CEREALBAR training set as a validation set for early stopping, comprising 5% of the original split.
Hardware Specification	Yes	We use a single Ge Force RTX 2080 Ti for training each model.
Software Dependencies	No	The paper mentions software components like 'BPE', 'LSTM RNN', 'LINGUNET', and 'ADAM' for optimization, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For training, we use a batch size of 16 agent steps, a learning rate of 0.001, and ADAM [19] for optimization.