Revisiting the Minimalist Approach to Offline Reinforcement Learning

Authors: Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Re BRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.
Researcher Affiliation Industry Denis Tarasov Vladislav Kurenkov Alexander Nikulin Sergey Kolesnikov Tinkoff {den.tarasov, v.kurenkov, a.p.nikulin, s.s.kolesnikov}@tinkoff.ai
Pseudocode No The paper provides mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our implementation is available at https://github.com/DT6A/ReBRAC
Open Datasets Yes We evaluate the proposed approach on three sets of D4RL tasks: Gym-Mu Jo Co, Ant Maze, and Adroit. For each domain, we consider all of the available datasets... In addition to testing Re BRAC on D4RL, we evaluated its performance on V-D4RL benchmark (Lu et al., 2022).
Dataset Splits No The paper uses D4RL and V-D4RL benchmarks, which are established offline RL datasets. However, it does not explicitly describe how these datasets are internally split into training, validation, and test subsets for the purpose of the experiments, rather it discusses hyperparameter tuning over 'training seeds' and evaluation over 'unseen training seeds' which refer to different experimental runs using the full dataset.
Hardware Specification Yes The experiments were conducted on V100 and A100 GPUs.
Software Dependencies No The paper mentions software like JAX and PyTorch and the Adam optimizer, but it does not specify version numbers for these software components or any other libraries.
Experiment Setup Yes For Re BRAC, we fine-tuned the β1 parameter for the actor, which was selected from 0.001, 0.01, 0.05, 0.1. Similarly, the β2 parameter for the critic was selected from a range of 0, 0.001, 0.01, 0.1, 0.5. The selected best parameters for each dataset are reported in Table 11... batch size 1024 on Gym-Mu Jo Co, 256 on other, learning rate (all networks) 1e-3 on Gym-Mu Jo Co, 3e-4 on Adroit and V-D4RL, 1e-4 on Antmaze, tau (τ) 5e-3, hidden dim (all networks) 256, num hidden layers (all networks) 3, gamma (γ) 0.999 on Ant Maze, 0.99 on other, nonlinearity Re LU.