Continual Learning for Instruction Following from Realtime Feedback

Authors: Alane Suhr, Yoav Artzi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. ... We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time.
Researcher Affiliation Academia Alane Suhr University of California, Berkeley suhr@berkeley.edu Yoav Artzi Cornell University yoav@cs.cornell.edu
Pseudocode Yes Algorithm 1 Continual learning for instruction following from realtime user feedback.
Open Source Code Yes Our code and data is available here: https://github.com/lil-lab/clif_cb.
Open Datasets Yes Our code and data is available here: https://github.com/lil-lab/clif_cb. ... the demonstration training dataset D0 includes 8,790 instructions from 456 randomly-sampled human-human interactions from Suhr et al. [41].
Dataset Splits Yes We use a held-out subset of the original CEREALBAR training set as a validation set for early stopping, comprising 5% of the original split.
Hardware Specification Yes We use a single Ge Force RTX 2080 Ti for training each model.
Software Dependencies No The paper mentions software components like 'BPE', 'LSTM RNN', 'LINGUNET', and 'ADAM' for optimization, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For training, we use a batch size of 16 agent steps, a learning rate of 0.001, and ADAM [19] for optimization.