Continual Learning for Instruction Following from Realtime Feedback
Authors: Alane Suhr, Yoav Artzi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. ... We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. |
| Researcher Affiliation | Academia | Alane Suhr University of California, Berkeley suhr@berkeley.edu Yoav Artzi Cornell University yoav@cs.cornell.edu |
| Pseudocode | Yes | Algorithm 1 Continual learning for instruction following from realtime user feedback. |
| Open Source Code | Yes | Our code and data is available here: https://github.com/lil-lab/clif_cb. |
| Open Datasets | Yes | Our code and data is available here: https://github.com/lil-lab/clif_cb. ... the demonstration training dataset D0 includes 8,790 instructions from 456 randomly-sampled human-human interactions from Suhr et al. [41]. |
| Dataset Splits | Yes | We use a held-out subset of the original CEREALBAR training set as a validation set for early stopping, comprising 5% of the original split. |
| Hardware Specification | Yes | We use a single Ge Force RTX 2080 Ti for training each model. |
| Software Dependencies | No | The paper mentions software components like 'BPE', 'LSTM RNN', 'LINGUNET', and 'ADAM' for optimization, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For training, we use a batch size of 16 agent steps, a learning rate of 0.001, and ADAM [19] for optimization. |