Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prioritized Generative Replay

Authors: Ren Wang, Kevin Frans, Pieter Abbeel, Sergey Levine, Alexei Efros

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both state-based and pixel-based RL tasks show PGR is consistently more sample-efficient than both model-free RL algorithms and generative approaches that do not use any guidance. In fact, by densifying the more relevant transitions, PGR is able to succeed in cases where unconditional generation struggles significantly. Moreover, we empirically demonstrate that PGR goes beyond simple prioritized experience replay; in particular, we show that conditioning on curiosity leads to more diverse and more learning-relevant generations.
Researcher Affiliation	Academia	Renhao Wang, Kevin Frans, Pieter Abbeel, Sergey Levine, and Alexei A. Efros Department of Electrical Engineering and Computer Science University of California, Berkeley. PA holds concurrent appointments as a Professor at UC Berkeley and as an Amazon Scholar. This paper describes work performed at UC Berkeley and is not associated with Amazon.
Pseudocode	Yes	Algorithm 1 Overview of our outer loop + inner loop framework.
Open Source Code	No	Project page available at: https://pgenreplay.github.io. This is a project page, not a direct link to a source-code repository, and the paper does not contain an unambiguous statement of code release for the methodology described.
Open Datasets	Yes	Our results span a range of state-based and pixel-based tasks in the Deep Mind Control Suite (DMC) (Tunyasuvunakool et al., 2020) and Open AI Gym (Brockman, 2016) environments.
Dataset Splits	Yes	In all tasks, we allow 100K environment interactions, a standard choice in online RL (Li et al., 2023; Kostrikov et al., 2020). Our benchmark follows exactly the online RL evaluation suite of prior work in generative RL by Lu et al. (2024), facilitating direct comparison.
Hardware Specification	No	Generation VRAM (GB) 4.31 6.67 (+54.7%) Generation also fits easily on modern GPUs (<12GB). The paper mentions VRAM usage and general GPU types, but does not provide specific models or detailed specifications of the hardware used.
Software Dependencies	No	The paper refers to algorithms and models like SAC (Haarnoja et al., 2018), REDQ (Chen et al., 2021), DRQ-V2 (Yarats et al., 2021), and PPO (Schulman et al., 2017), but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	during training we randomly discard the scalar given by our relevance function F with probability 0.25. Implementation-wise, we keep both Dreal and Dsyn at 1M transitions, and randomly sample synthetic and real data mixed according to some ratio r to train our policy π. ...we increase the number of hidden layers from 2 to 3, and their widths from 256 to 512. ...we also increase batch size from 256 to 1024 to maintain per-parameter throughput. We now double the batch size to 512 and then to 1024, each time scaling r to 0.75 and 0.875, respectively. Finally, we double the UTD from 20 to 40, and the size of the synthetic data buffer Dsyn, from 1M to 2M transitions. Specifically, we parameterize the neural networks as three-layer CNNs, with bottleneck latent dimension 64 and feature output dimension 512. The CNNs are followed by a two-layer MLP projection, also of dimension 512. We resize the visual observations to 42 42 pixels, and use 8 context bins. We directly use the recommended hyperparameters in Savinov et al. (2018), setting α = 0.03, β = 0.5, M = 200, and F = percentile-90. The embedder network E is a Res Net-18 architecture with output dimension 512, followed by a four-layer MLP, also with feature and output dimensions of 512.