Continual World: A Robotic Benchmark For Continual Reinforcement Learning
Authors: Maciej Wołczyk, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand the performance of existing and future solutions. |
| Researcher Affiliation | Collaboration | Maciej Wołczyk Jagiellonian University Kraków, Poland; Michał Zaj ac Jagiellonian University Kraków, Poland; Razvan Pascanu Deep Mind London, UK; Łukasz Kuci nski Polish Academy of Sciences Warsaw, Poland; Piotr Miło s Polish Academy of Sciences, University of Oxford, deepsense.ai Warsaw, Poland |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Information about the benchmark, including the open-source code, is available at https://sites.google.com/view/continualworld. |
| Open Datasets | Yes | The benchmark is built on realistic robotic manipulation tasks from Meta-World [54], benefiting from its diversity but also being computationally cheap. |
| Dataset Splits | No | The paper mentions a "validation task" for hyperparameter tuning but does not specify a training/validation/test dataset split in terms of percentages or counts of data. |
| Hardware Specification | Yes | We use 8-core machines without GPU. Training the CW20 sequence of twenty tasks takes about 100 hours. |
| Software Dependencies | No | The paper mentions using "Meta-World" and "soft actor-critic (SAC)" and building on "clean-rl [1]" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use an MLP network with 4 layers of 256 neurons. For training, we use soft actor-critic (SAC) [17]... We train each task for 1M environment steps, using a batch size of 256. We use Adam optimizer [26] with learning rate 3e-4 and the SAC hyperparameters from clean-rl library for actor-critic architecture, and with target entropy set to -4. We used γ = 0.99 for all the tasks. The neural network architecture is MLP with 4 layers of 256 neurons with ReLU activations. We used LayerNorm [7] layers after each linear layer. The output layers do not use LayerNorm. |