Disentangling Transfer in Continual Reinforcement Learning
Authors: Maciej Wolczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. We perform our experiments on the Continual World [47] benchmark. |
| Researcher Affiliation | Collaboration | Maciej Wołczyk Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland maciej.wolczyk@doctoral.uj.edu.pl Michał Zaj ac Faculty of Mathematics and Computer Science Jagiellonian University Kraków, Poland emzajac@gmail.com Razvan Pascanu Deep Mind London, UK razp@google.com Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland lkucinski@impan.pl Piotr Miło s Ideas NCBR, Polish Academy of Sciences, deepsense.ai Warsaw, Poland pmilos@impan.pl |
| Pseudocode | No | The paper does not contain any pseudocode blocks or clearly labeled algorithms. |
| Open Source Code | Yes | The code, including the scripts used to run the experiments from the paper, are in the supplementary materials. |
| Open Datasets | Yes | We perform our experiments on the Continual World [47] benchmark. It contains a set of realistic robotic tasks, where a simulated Sawyer robot manipulates everyday objects. |
| Dataset Splits | No | The paper describes training and evaluation on task sequences (CW10, CW20) but does not specify explicit training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | This research was supported by the PL-Grid Infrastructure. We describe these details in Appendix F. |
| Software Dependencies | No | The paper mentions using SAC and MLP networks, and refers to Appendix A for more experimental setup details. However, specific version numbers for software dependencies (e.g., PyTorch version) are not provided in the main text or available appendices. |
| Experiment Setup | Yes | The actor and the critic are implemented as two separate MLP networks, each with 4 hidden layers of 256 neurons. By default, we assume the multi-head (MH) setting, where each task has its separate output head, but we also consider the single-head (SH) setting, where only a single head is used for all tasks. The SAC exploration phase takes K = 10k steps. All experiments in this paper were performed with 10 different seeds unless noted otherwise. |