Revisiting the Minimalist Approach to Offline Reinforcement Learning
Authors: Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Re BRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments. |
| Researcher Affiliation | Industry | Denis Tarasov Vladislav Kurenkov Alexander Nikulin Sergey Kolesnikov Tinkoff {den.tarasov, v.kurenkov, a.p.nikulin, s.s.kolesnikov}@tinkoff.ai |
| Pseudocode | No | The paper provides mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/DT6A/ReBRAC |
| Open Datasets | Yes | We evaluate the proposed approach on three sets of D4RL tasks: Gym-Mu Jo Co, Ant Maze, and Adroit. For each domain, we consider all of the available datasets... In addition to testing Re BRAC on D4RL, we evaluated its performance on V-D4RL benchmark (Lu et al., 2022). |
| Dataset Splits | No | The paper uses D4RL and V-D4RL benchmarks, which are established offline RL datasets. However, it does not explicitly describe how these datasets are internally split into training, validation, and test subsets for the purpose of the experiments, rather it discusses hyperparameter tuning over 'training seeds' and evaluation over 'unseen training seeds' which refer to different experimental runs using the full dataset. |
| Hardware Specification | Yes | The experiments were conducted on V100 and A100 GPUs. |
| Software Dependencies | No | The paper mentions software like JAX and PyTorch and the Adam optimizer, but it does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | For Re BRAC, we fine-tuned the β1 parameter for the actor, which was selected from 0.001, 0.01, 0.05, 0.1. Similarly, the β2 parameter for the critic was selected from a range of 0, 0.001, 0.01, 0.1, 0.5. The selected best parameters for each dataset are reported in Table 11... batch size 1024 on Gym-Mu Jo Co, 256 on other, learning rate (all networks) 1e-3 on Gym-Mu Jo Co, 3e-4 on Adroit and V-D4RL, 1e-4 on Antmaze, tau (τ) 5e-3, hidden dim (all networks) 256, num hidden layers (all networks) 3, gamma (γ) 0.999 on Ant Maze, 0.99 on other, nonlinearity Re LU. |