In-context Reinforcement Learning with Algorithm Distillation

Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Stenberg Hansen, Angelos Filos, Ethan Brooks, maxime gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
Researcher Affiliation Industry Michael Laskin , Luyu Wang , Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih Deep Mind
Pseudocode Yes Algorithm 1 Algorithm Distillation
Open Source Code No The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a link to a code repository.
Open Datasets No A dataset of learning histories is collected by training N individual single-task gradient-based RL algorithms. This dataset is generated by the authors, and there is no information or link provided indicating it is publicly available.
Dataset Splits No The paper mentions 'Train {Mtrain} and test {Mtest} tasks' and 'sample a task Mtrain i randomly from the train task distribution.' It does not explicitly mention a separate validation dataset split for model training.
Hardware Specification No The paper mentions distributed RL algorithms (e.g., 'A3C...with 100 actors', 'distributed DQN with 16 parallel actors') which implies significant computational resources, but it does not specify any particular hardware components like GPU models, CPU types, or cloud instance configurations.
Software Dependencies No The paper mentions various models and algorithms used (e.g., 'GPT causal transformer model', 'RNNs', 'LSTM', 'A3C', 'DQN') and specific regularization techniques, but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages.
Experiment Setup Yes Table 1: Algorithm Distillation Architecture Hyperparameters. Table 2: Algorithm Distillation Optimization Hyperparameters. Table 3: Watermaze Image Encoder Hyperparameters. Table 4: Source A3C Algorithm Hyperparameters for Dark Environments. Table 5: Source DQN(Q-λ) Algorithm Hyperparameters for Watermaze. Table 6: RL2 Hyperparameters used in Dark Environments. Table 7: RL2 Hyperparameters used in the Watermaze Environment.