Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Minimalist Approach to Offline Reinforcement Learning
Authors: Scott Fujimoto, Shixiang (Shane) Gu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our minimal changes to the TD3 algorithm on the D4RL benchmark of continuous control tasks [Fu et al., 2020]. We find that our algorithm compares favorably against many offline RL algorithms, while being significantly easier to implement and more than halving the required computation cost. |
| Researcher Affiliation | Collaboration | 1Mila, Mc Gill University 2Google Research, Brain Team |
| Pseudocode | No | The paper describes the algorithmic changes using mathematical equations (Equations 1, 2, 3, 4, 5) and textual descriptions, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | To accommodate reproduciblity, all of our code is open-sourced1. 1https://github.com/sfujim/TD3_BC |
| Open Datasets | Yes | We evaluate our proposed approach on the D4RL benchmark of Open AI gym Mu Jo Co tasks [Todorov et al., 2012, Brockman et al., 2016, Fu et al., 2020] |
| Dataset Splits | Yes | We train each algorithm for 1 million time steps and evaluate every 5000 time steps. Each evaluation consists of 10 episodes. |
| Hardware Specification | Yes | All run time experiments were run with a single Ge Force GTX 1080 GPU and an Intel Core i7-6700K CPU at 4.00GHz. |
| Software Dependencies | No | The paper mentions using 'Py Torch [Paszke et al., 2019]' as a framework for re-implementing Fisher-BRC, but it does not specify version numbers for PyTorch or any other software dependencies crucial for reproducibility of the experiments described in the paper. |
| Experiment Setup | Yes | We train each algorithm for 1 million time steps and evaluate every 5000 time steps. Each evaluation consists of 10 episodes. ... We use α = 2.5 in our experiments. ... Secondly, we normalize the features of every state in the provided dataset. |