Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints
Authors: Anikait Singh, Aviral Kumar, Quan Vuong, Yevgen Chebotar, Sergey Levine
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the D4RL benchmark suite (Fu et al., 2020) and demonstrate that REDS significantly outperforms existing offline RL baselines on several benchmarks across different domains and difficulties. |
| Researcher Affiliation | Industry | Yuanfu Liao Google Research EMAIL George Tucker Google Research EMAIL Ofir Nachum Google Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Robust Exploration with Dataset Heteroskedasticity via Support Constraints (REDS) |
| Open Source Code | Yes | Code is available at github.com/google-research/reds |
| Open Datasets | Yes | We conduct experiments on the D4RL benchmark suite (Fu et al., 2020), which consists of a set of locomotion and AntMaze tasks from continuous control, as well as Adroit and FrankaKitchen tasks with challenging robot manipulation datasets. |
| Dataset Splits | Yes | We use the standard D4RL splits for training, validation, and evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU/CPU models or cloud instance types. |
| Software Dependencies | No | We implement REDS using the Jax and Flax libraries. The paper does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train all models with a batch size of 256, a discount factor of 0.99, and learning rates of 10−4 for the policy and Q-functions, and 10−5 for the support function and β. |