In-context Reinforcement Learning with Algorithm Distillation
Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Stenberg Hansen, Angelos Filos, Ethan Brooks, maxime gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data. |
| Researcher Affiliation | Industry | Michael Laskin , Luyu Wang , Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih Deep Mind |
| Pseudocode | Yes | Algorithm 1 Algorithm Distillation |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | No | A dataset of learning histories is collected by training N individual single-task gradient-based RL algorithms. This dataset is generated by the authors, and there is no information or link provided indicating it is publicly available. |
| Dataset Splits | No | The paper mentions 'Train {Mtrain} and test {Mtest} tasks' and 'sample a task Mtrain i randomly from the train task distribution.' It does not explicitly mention a separate validation dataset split for model training. |
| Hardware Specification | No | The paper mentions distributed RL algorithms (e.g., 'A3C...with 100 actors', 'distributed DQN with 16 parallel actors') which implies significant computational resources, but it does not specify any particular hardware components like GPU models, CPU types, or cloud instance configurations. |
| Software Dependencies | No | The paper mentions various models and algorithms used (e.g., 'GPT causal transformer model', 'RNNs', 'LSTM', 'A3C', 'DQN') and specific regularization techniques, but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages. |
| Experiment Setup | Yes | Table 1: Algorithm Distillation Architecture Hyperparameters. Table 2: Algorithm Distillation Optimization Hyperparameters. Table 3: Watermaze Image Encoder Hyperparameters. Table 4: Source A3C Algorithm Hyperparameters for Dark Environments. Table 5: Source DQN(Q-λ) Algorithm Hyperparameters for Watermaze. Table 6: RL2 Hyperparameters used in Dark Environments. Table 7: RL2 Hyperparameters used in the Watermaze Environment. |