reproducibilityindex.ai

Efficient Online Reinforcement Learning with Offline Data

Authors: Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a 2.5 improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.
Researcher Affiliation	Academia	1University of Oxford 2UC Berkeley. Correspondence to: Philip J. Ball <ball@robots.ox.ac.uk>, Laura Smith <smithlaura@berkeley.edu>, Ilya Kostrikov <kostrikov@berkeley.edu>.
Pseudocode	Yes	Algorithm 1 Online RL with Offline Data (RLPD)
Open Source Code	Yes	We have released our code here: github.com/ikostrikov/rlpd.
Open Datasets	Yes	Sparse Adroit (Nair et al., 2020). D4RL Ant Maze (Fu et al., 2020). D4RL Locomotion (Fu et al., 2020). V-D4RL (Lu et al., 2022)
Dataset Splits	No	No specific details about train/validation/test dataset splits, percentages, or explicit sample counts were found.
Hardware Specification	No	The paper mentions using 'the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley' but does not specify any particular GPU models, CPU models, or detailed hardware configurations used for the experiments.
Software Dependencies	No	The paper states the codebase is 'written in JAX (Bradbury et al., 2018)' but does not provide specific version numbers for JAX or any other software dependencies, such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	Table 1. RLPD hyperparameters. Parameter Value Online batch size 128 Offline batch size 128 Discount (γ) 0.99 Optimizer Adam Learning rate 3 10 4 Ensemble size (E) 10 Critic EMA weight (ρ) 0.005 Gradient Steps (State Based) (G or UTD) 20 Network Width 256 Units Initial Entropy Temperature (α) 1.0 Target Entropy dim(A)/2