reproducibilityindex.ai

Disentangling Transfer in Continual Reinforcement Learning

Authors: Maciej Wolczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. We perform our experiments on the Continual World [47] benchmark.
Researcher Affiliation	Collaboration	Maciej Wołczyk Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland maciej.wolczyk@doctoral.uj.edu.pl Michał Zaj ac Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland emzajac@gmail.com Razvan Pascanu Deep Mind London, UK razp@google.com Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland lkucinski@impan.pl Piotr Miło s Ideas NCBR, Polish Academy of Sciences, deepsense.ai Warsaw, Poland pmilos@impan.pl
Pseudocode	No	The paper does not contain any pseudocode blocks or clearly labeled algorithms.
Open Source Code	Yes	The code, including the scripts used to run the experiments from the paper, are in the supplementary materials.
Open Datasets	Yes	We perform our experiments on the Continual World [47] benchmark. It contains a set of realistic robotic tasks, where a simulated Sawyer robot manipulates everyday objects.
Dataset Splits	No	The paper describes training and evaluation on task sequences (CW10, CW20) but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	This research was supported by the PL-Grid Infrastructure. We describe these details in Appendix F.
Software Dependencies	No	The paper mentions using SAC and MLP networks, and refers to Appendix A for more experimental setup details. However, specific version numbers for software dependencies (e.g., PyTorch version) are not provided in the main text or available appendices.
Experiment Setup	Yes	The actor and the critic are implemented as two separate MLP networks, each with 4 hidden layers of 256 neurons. By default, we assume the multi-head (MH) setting, where each task has its separate output head, but we also consider the single-head (SH) setting, where only a single head is used for all tasks. The SAC exploration phase takes K = 10k steps. All experiments in this paper were performed with 10 different seeds unless noted otherwise.