Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Authors: Maciej Wołczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand the performance of existing and future solutions.
Researcher Affiliation Collaboration Maciej Wołczyk Jagiellonian University Kraków, Poland; Michał Zaj ac Jagiellonian University Kraków, Poland; Razvan Pascanu Deep Mind London, UK; Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland; Piotr Miło s Polish Academy of Sciences, University of Oxford, deepsense.ai Warsaw, Poland
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes Information about the benchmark, including the open-source code, is available at https://sites.google.com/view/continualworld.
Open Datasets Yes The benchmark is built on realistic robotic manipulation tasks from Meta-World [54], benefiting from its diversity but also being computationally cheap.
Dataset Splits No The paper mentions a "validation task" for hyperparameter tuning but does not specify a training/validation/test dataset split in terms of percentages or counts of data.
Hardware Specification Yes We use 8-core machines without GPU. Training the CW20 sequence of twenty tasks takes about 100 hours.
Software Dependencies No The paper mentions using "Meta-World" and "soft actor-critic (SAC)" and building on "clean-rl [1]" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use an MLP network with 4 layers of 256 neurons. For training, we use soft actor-critic (SAC) [17]... We train each task for 1M environment steps, using a batch size of 256. We use Adam optimizer [26] with learning rate 3e-4 and the SAC hyperparameters from clean-rl library for actor-critic architecture, and with target entropy set to -4. We used γ = 0.99 for all the tasks. The neural network architecture is MLP with 4 layers of 256 neurons with ReLU activations. We used LayerNorm [7] layers after each linear layer. The output layers do not use LayerNorm.