Building a Subspace of Policies for Scalable Continual Learning

Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on 18 CRL scenarios from two different domains, locomotion in Brax and robotic manipulation in Continual World, a challenging CRL benchmark (Wołczyk et al., 2021). We also compare CSP with a number of popular CRL baselines, including both fixed-size and growing-size methods.
Researcher Affiliation Collaboration Jean-Baptiste Gaya Meta AI Research CNRS-ISIR, Sorbonne University, Paris, France jbgaya@meta.com Thang Doan Mc Gill University, Mila (Now at Bosch Research) thang.doan@mail.mcgill.ca Lucas Caccia Mc Gill University, Mila lucas.page-caccia@mail.mcgill.ca Laure Soulier CNRS-ISIR, Sorbonne University, Paris, France laure.soulier@isir.upmc.fr Ludovic Denoyer Ubisoft France ludovic.denoyer@ubisoft.com Roberta Raileanu Meta AI Research raileanu@meta.com
Pseudocode Yes A pseudo-code is available in Appendix C.1.
Open Source Code Yes Code is available here.
Open Datasets Yes We evaluate CSP on 18 CRL scenarios containing 35 different RL tasks, from two continuous control domains, locomotion in Brax (Freeman et al., 2021) and robotic manipulation in Continual World (CW, Wołczyk et al. (2021)), a challenging CRL benchmark.
Dataset Splits No The paper mentions training budgets ('budget of 1M interactions for each task') and evaluation procedures, but does not specify explicit train/validation/test data splits or proportions for the datasets used.
Hardware Specification Yes Each algorithm was trained using one Intel(R) Xeon(R) CPU cores (E5-2698 v4 @ 2.20GHz) and one NVIDIA V100 GPU.
Software Dependencies No The paper states: 'All the experiments were implemented with Sa Lin A (Denoyer et al., 2021)... We used Soft Actor Critic (Haarnoja et al., 2018b) as the routine algorithm for each method.' However, specific version numbers for these or other software libraries (e.g., Python, PyTorch) are not provided.
Experiment Setup Yes We run a gridsearch on SAC hyper-parameters (see Table 3) on FT-N and select the best set in terms of final average performance (see Section 4 for the details about this metric). Then, we freeze these hyper-parameters and performed a specific gridsearch for CSP and each baseline (see Table 4). Each hyper-parameter set is evaluated over 10 seeds.