Teachable Reinforcement Learning via Advice Distillation
Authors: Olivia Watkins, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Jacob Andreas
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning. |
| Researcher Affiliation | Academia | Olivia Watkins UC Berkeley oliviawatkins@berkeley.edu Trevor Darrell UC Berkeley trevor@eecs.berkeley.edu Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu Jacob Andreas MIT jda@mit.edu Abhishek Gupta UC Berkeley abhigupta@berkeley.edu |
| Pseudocode | No | The paper describes the algorithms in prose and uses diagrams (e.g., Figure 2) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experi-mental results (either in the supplemental material or as a URL)? [Yes] See Appendix A for link to URL and run instructions in the README in the github repo. |
| Open Datasets | Yes | Baby AI: In the open-source Baby AI [8] grid-world... Ant-Maze Navigation (Ant): The open-source ant-maze navigation domain [14] replaces the simple point mass agent... Envs we used are cited in section 4.1 |
| Dataset Splits | Yes | The details of the exact set of training and testing tasks, as well as architecture and algorithmic details, are provided in the appendix. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix A |
| Software Dependencies | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A. |
| Experiment Setup | Yes | The details of the exact set of training and testing tasks, as well as architecture and algorithmic details, are provided in the appendix. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix A. |