Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Online Reinforcement Learning with Offline Data

Authors: Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a 2.5 improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.
Researcher Affiliation Academia 1University of Oxford 2UC Berkeley. Correspondence to: Philip J. Ball <EMAIL>, Laura Smith <EMAIL>, Ilya Kostrikov <EMAIL>.
Pseudocode Yes Algorithm 1 Online RL with Offline Data (RLPD)
Open Source Code Yes We have released our code here: github.com/ikostrikov/rlpd.
Open Datasets Yes Sparse Adroit (Nair et al., 2020). D4RL Ant Maze (Fu et al., 2020). D4RL Locomotion (Fu et al., 2020). V-D4RL (Lu et al., 2022)
Dataset Splits No No specific details about train/validation/test dataset splits, percentages, or explicit sample counts were found.
Hardware Specification No The paper mentions using 'the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley' but does not specify any particular GPU models, CPU models, or detailed hardware configurations used for the experiments.
Software Dependencies No The paper states the codebase is 'written in JAX (Bradbury et al., 2018)' but does not provide specific version numbers for JAX or any other software dependencies, such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes Table 1. RLPD hyperparameters. Parameter Value Online batch size 128 Offline batch size 128 Discount (γ) 0.99 Optimizer Adam Learning rate 3 10 4 Ensemble size (E) 10 Critic EMA weight (ρ) 0.005 Gradient Steps (State Based) (G or UTD) 20 Network Width 256 Units Initial Entropy Temperature (α) 1.0 Target Entropy dim(A)/2