Teachable Reinforcement Learning via Advice Distillation

Authors: Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning.
Researcher Affiliation Academia Olivia Watkins UC Berkeley oliviawatkins@berkeley.edu Trevor Darrell UC Berkeley trevor@eecs.berkeley.edu Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu Jacob Andreas MIT jda@mit.edu Abhishek Gupta UC Berkeley abhigupta@berkeley.edu
Pseudocode No The paper describes the algorithms in prose and uses diagrams (e.g., Figure 2) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experi-mental results (either in the supplemental material or as a URL)? [Yes] See Appendix A for link to URL and run instructions in the README in the github repo.
Open Datasets Yes Baby AI: In the open-source Baby AI [8] grid-world... Ant-Maze Navigation (Ant): The open-source ant-maze navigation domain [14] replaces the simple point mass agent... Envs we used are cited in section 4.1
Dataset Splits Yes The details of the exact set of training and testing tasks, as well as architecture and algorithmic details, are provided in the appendix. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A
Software Dependencies Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A.
Experiment Setup Yes The details of the exact set of training and testing tasks, as well as architecture and algorithmic details, are provided in the appendix. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A.