Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Authors: Jongchan Park, Mingyu Park, Donghwan Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the D4RL, Robomimic, V-D4RL, and Exo RL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains.
Researcher Affiliation	Collaboration	Jongchan Park Hyundai Motor Company EMAIL Mingyu Park KAIST EMAIL Donghwan Lee KAIST EMAIL
Pseudocode	Yes	The complete pretraining procedure is summarized in Algorithm 1.
Open Source Code	Yes	Additionally, the source code is available in our Git Hub repository2. 2https://github.com/daisophila/PSQN.git
Open Datasets	Yes	Extensive experiments on the D4RL, Robomimic, V-D4RL, and Exo RL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains.
Dataset Splits	Yes	Each reduced dataset was constructed by uniformly sampling transition segments (s, a, r, s') from the full dataset, followed by both pretraining and RL training using these subsets. [...] on progressively reduced subsets of the D4RL datasets (1%, 3%, 10%, 30%, and 100%) spanning various data qualities...
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA RTX A5000 GPU for both training and evaluation.
Software Dependencies	No	Specifically, we adopt publicly available Py Torch-based repositories for each baseline: (Insufficient detail as it does not include specific version numbers for PyTorch, Python, or other libraries).
Experiment Setup	Yes	For D4RL experiments, each agent is trained for 1 million gradient steps per environment across five random seeds. Evaluation is conducted every 5k gradient steps for AWAC, CQL, and TD3+BC, and every 10k steps for IQL, using five rollouts per evaluation.