reproducibilityindex.ai

In-context Reinforcement Learning with Algorithm Distillation

Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Stenberg Hansen, Angelos Filos, Ethan Brooks, maxime gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
Researcher Affiliation	Industry	Michael Laskin , Luyu Wang , Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih Deep Mind
Pseudocode	Yes	Algorithm 1 Algorithm Distillation
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	No	A dataset of learning histories is collected by training N individual single-task gradient-based RL algorithms. This dataset is generated by the authors, and there is no information or link provided indicating it is publicly available.
Dataset Splits	No	The paper mentions 'Train {Mtrain} and test {Mtest} tasks' and 'sample a task Mtrain i randomly from the train task distribution.' It does not explicitly mention a separate validation dataset split for model training.
Hardware Specification	No	The paper mentions distributed RL algorithms (e.g., 'A3C...with 100 actors', 'distributed DQN with 16 parallel actors') which implies significant computational resources, but it does not specify any particular hardware components like GPU models, CPU types, or cloud instance configurations.
Software Dependencies	No	The paper mentions various models and algorithms used (e.g., 'GPT causal transformer model', 'RNNs', 'LSTM', 'A3C', 'DQN') and specific regularization techniques, but it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages.
Experiment Setup	Yes	Table 1: Algorithm Distillation Architecture Hyperparameters. Table 2: Algorithm Distillation Optimization Hyperparameters. Table 3: Watermaze Image Encoder Hyperparameters. Table 4: Source A3C Algorithm Hyperparameters for Dark Environments. Table 5: Source DQN(Q-λ) Algorithm Hyperparameters for Watermaze. Table 6: RL2 Hyperparameters used in Dark Environments. Table 7: RL2 Hyperparameters used in the Watermaze Environment.