Disentangling Transfer in Continual Reinforcement Learning

Authors: Maciej Wolczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. We perform our experiments on the Continual World [47] benchmark.
Researcher Affiliation Collaboration Maciej Wołczyk Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland maciej.wolczyk@doctoral.uj.edu.pl Michał Zaj ac Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland emzajac@gmail.com Razvan Pascanu Deep Mind London, UK razp@google.com Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland lkucinski@impan.pl Piotr Miło s Ideas NCBR, Polish Academy of Sciences, deepsense.ai Warsaw, Poland pmilos@impan.pl
Pseudocode No The paper does not contain any pseudocode blocks or clearly labeled algorithms.
Open Source Code Yes The code, including the scripts used to run the experiments from the paper, are in the supplementary materials.
Open Datasets Yes We perform our experiments on the Continual World [47] benchmark. It contains a set of realistic robotic tasks, where a simulated Sawyer robot manipulates everyday objects.
Dataset Splits No The paper describes training and evaluation on task sequences (CW10, CW20) but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes This research was supported by the PL-Grid Infrastructure. We describe these details in Appendix F.
Software Dependencies No The paper mentions using SAC and MLP networks, and refers to Appendix A for more experimental setup details. However, specific version numbers for software dependencies (e.g., PyTorch version) are not provided in the main text or available appendices.
Experiment Setup Yes The actor and the critic are implemented as two separate MLP networks, each with 4 hidden layers of 256 neurons. By default, we assume the multi-head (MH) setting, where each task has its separate output head, but we also consider the single-head (SH) setting, where only a single head is used for all tasks. The SAC exploration phase takes K = 10k steps. All experiments in this paper were performed with 10 different seeds unless noted otherwise.