reproducibilityindex.ai

Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Authors: Maciej Wołczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand the performance of existing and future solutions.
Researcher Affiliation	Collaboration	Maciej Wołczyk Jagiellonian University Kraków, Poland; Michał Zaj ac Jagiellonian University Kraków, Poland; Razvan Pascanu Deep Mind London, UK; Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland; Piotr Miło s Polish Academy of Sciences, University of Oxford, deepsense.ai Warsaw, Poland
Pseudocode	No	The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Information about the benchmark, including the open-source code, is available at https://sites.google.com/view/continualworld.
Open Datasets	Yes	The benchmark is built on realistic robotic manipulation tasks from Meta-World [54], beneﬁting from its diversity but also being computationally cheap.
Dataset Splits	No	The paper mentions a "validation task" for hyperparameter tuning but does not specify a training/validation/test dataset split in terms of percentages or counts of data.
Hardware Specification	Yes	We use 8-core machines without GPU. Training the CW20 sequence of twenty tasks takes about 100 hours.
Software Dependencies	No	The paper mentions using "Meta-World" and "soft actor-critic (SAC)" and building on "clean-rl [1]" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use an MLP network with 4 layers of 256 neurons. For training, we use soft actor-critic (SAC) [17]... We train each task for 1M environment steps, using a batch size of 256. We use Adam optimizer [26] with learning rate 3e-4 and the SAC hyperparameters from clean-rl library for actor-critic architecture, and with target entropy set to -4. We used γ = 0.99 for all the tasks. The neural network architecture is MLP with 4 layers of 256 neurons with ReLU activations. We used LayerNorm [7] layers after each linear layer. The output layers do not use LayerNorm.