Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wuthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems 2Harvard University 3KTH Stockholm
Pseudocode No The paper describes methods and processes in narrative text and uses figures/tables, but it does not include formal pseudocode blocks or algorithm listings.
Open Source Code No The paper mentions using and evaluating 'open-sourced offline reinforcement learning algorithms' (like d3rlpy) and that the TriFinger platform itself is open-source. However, it does not provide an explicit statement or link for the open-sourcing of the authors' specific implementation code for their benchmarking experiments described in the paper.
Open Datasets Yes We publish the datasets we propose as benchmarks (sections 3.3 and B) and provide access to the cluster of real Tri Finger platforms (section 2) we used for data collection. ... For installation instructions and further details we refer to the repository of the Python package at https://github.com/rr-learning/trifinger_rl_datasets.
Dataset Splits No The paper describes the collection of datasets (e.g., Expert, Mixed, Weak&Expert) and how algorithms were trained on these, but it does not specify explicit training, validation, and testing splits of these datasets for the offline RL algorithm training process. It mentions evaluating with a fixed set of randomly sampled goals, which implies a test set, but no general validation split.
Hardware Specification No The paper describes the TriFinger robot platform in detail, including its degrees of freedom and motors, but it does not specify the computing hardware (e.g., CPU, specific GPU models) used to run the RL algorithms or simulations for the experiments. It only mentions 'GPU-accelerated rigid body physics simulator' for expert policy training and 'the robot cluster does not provide GPU-access'.
Software Dependencies Yes We use the implementations of BC, CRR, AWAC, CQL and IQL provided by the open-source library D3RLPY (Seno & Imai, 2021). For our experiments we used versions 1.1.0 and 1.1.1 of D3RLPY.
Experiment Setup Yes We train with five different seeds for each algorithm and evaluate with a fixed set of randomly sampled goals. ... The number of episodes collected per job is eight for the Push task (15 s per episode) and six for the Lift task (30 s per episode). ... We choose 10 ms for the Push task and 2 ms for the Lift task as we found training with bigger delays difficult. ... We performed a grid search over hyperparameters for all algorithms as documented in Table S7. The hyperparameter setting with the highest performance in terms of final average return on Lift-Sim Weak&Expert was selected, as listed in Table S8. In the paper, the results with optimized parameters are marked with a . Otherwise, the default parameters were used, as listed in Table S9.