A Minimalist Approach to Offline Reinforcement Learning
Authors: Scott Fujimoto, Shixiang (Shane) Gu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our minimal changes to the TD3 algorithm on the D4RL benchmark of continuous control tasks [Fu et al., 2020]. We find that our algorithm compares favorably against many offline RL algorithms, while being significantly easier to implement and more than halving the required computation cost. |
| Researcher Affiliation | Collaboration | 1Mila, Mc Gill University 2Google Research, Brain Team |
| Pseudocode | No | The paper describes the algorithmic changes using mathematical equations (Equations 1, 2, 3, 4, 5) and textual descriptions, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | To accommodate reproduciblity, all of our code is open-sourced1. 1https://github.com/sfujim/TD3_BC |
| Open Datasets | Yes | We evaluate our proposed approach on the D4RL benchmark of Open AI gym Mu Jo Co tasks [Todorov et al., 2012, Brockman et al., 2016, Fu et al., 2020] |
| Dataset Splits | Yes | We train each algorithm for 1 million time steps and evaluate every 5000 time steps. Each evaluation consists of 10 episodes. |
| Hardware Specification | Yes | All run time experiments were run with a single Ge Force GTX 1080 GPU and an Intel Core i7-6700K CPU at 4.00GHz. |
| Software Dependencies | No | The paper mentions using 'Py Torch [Paszke et al., 2019]' as a framework for re-implementing Fisher-BRC, but it does not specify version numbers for PyTorch or any other software dependencies crucial for reproducibility of the experiments described in the paper. |
| Experiment Setup | Yes | We train each algorithm for 1 million time steps and evaluate every 5000 time steps. Each evaluation consists of 10 episodes. ... We use α = 2.5 in our experiments. ... Secondly, we normalize the features of every state in the provided dataset. |