Building a Subspace of Policies for Scalable Continual Learning
Authors: Jean-Baptiste Gaya, Thang Doan, Lucas Caccia, Laure Soulier, Ludovic Denoyer, Roberta Raileanu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on 18 CRL scenarios from two different domains, locomotion in Brax and robotic manipulation in Continual World, a challenging CRL benchmark (Wołczyk et al., 2021). We also compare CSP with a number of popular CRL baselines, including both fixed-size and growing-size methods. |
| Researcher Affiliation | Collaboration | Jean-Baptiste Gaya Meta AI Research CNRS-ISIR, Sorbonne University, Paris, France jbgaya@meta.com Thang Doan Mc Gill University, Mila (Now at Bosch Research) thang.doan@mail.mcgill.ca Lucas Caccia Mc Gill University, Mila lucas.page-caccia@mail.mcgill.ca Laure Soulier CNRS-ISIR, Sorbonne University, Paris, France laure.soulier@isir.upmc.fr Ludovic Denoyer Ubisoft France ludovic.denoyer@ubisoft.com Roberta Raileanu Meta AI Research raileanu@meta.com |
| Pseudocode | Yes | A pseudo-code is available in Appendix C.1. |
| Open Source Code | Yes | Code is available here. |
| Open Datasets | Yes | We evaluate CSP on 18 CRL scenarios containing 35 different RL tasks, from two continuous control domains, locomotion in Brax (Freeman et al., 2021) and robotic manipulation in Continual World (CW, Wołczyk et al. (2021)), a challenging CRL benchmark. |
| Dataset Splits | No | The paper mentions training budgets ('budget of 1M interactions for each task') and evaluation procedures, but does not specify explicit train/validation/test data splits or proportions for the datasets used. |
| Hardware Specification | Yes | Each algorithm was trained using one Intel(R) Xeon(R) CPU cores (E5-2698 v4 @ 2.20GHz) and one NVIDIA V100 GPU. |
| Software Dependencies | No | The paper states: 'All the experiments were implemented with Sa Lin A (Denoyer et al., 2021)... We used Soft Actor Critic (Haarnoja et al., 2018b) as the routine algorithm for each method.' However, specific version numbers for these or other software libraries (e.g., Python, PyTorch) are not provided. |
| Experiment Setup | Yes | We run a gridsearch on SAC hyper-parameters (see Table 3) on FT-N and select the best set in terms of final average performance (see Section 4 for the details about this metric). Then, we freeze these hyper-parameters and performed a specific gridsearch for CSP and each baseline (see Table 4). Each hyper-parameter set is evaluated over 10 seeds. |