Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

Authors: Jongchan Park, Mingyu Park, Donghwan Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the D4RL, Robomimic, V-D4RL, and Exo RL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains.
Researcher Affiliation Collaboration Jongchan Park Hyundai Motor Company EMAIL Mingyu Park KAIST EMAIL Donghwan Lee KAIST EMAIL
Pseudocode Yes The complete pretraining procedure is summarized in Algorithm 1.
Open Source Code Yes Additionally, the source code is available in our Git Hub repository2. 2https://github.com/daisophila/PSQN.git
Open Datasets Yes Extensive experiments on the D4RL, Robomimic, V-D4RL, and Exo RL benchmarks show that our method substantially improves both performance and data efficiency across diverse datasets and domains.
Dataset Splits Yes Each reduced dataset was constructed by uniformly sampling transition segments (s, a, r, s') from the full dataset, followed by both pretraining and RL training using these subsets. [...] on progressively reduced subsets of the D4RL datasets (1%, 3%, 10%, 30%, and 100%) spanning various data qualities...
Hardware Specification Yes All experiments were conducted on a single NVIDIA RTX A5000 GPU for both training and evaluation.
Software Dependencies No Specifically, we adopt publicly available Py Torch-based repositories for each baseline: (Insufficient detail as it does not include specific version numbers for PyTorch, Python, or other libraries).
Experiment Setup Yes For D4RL experiments, each agent is trained for 1 million gradient steps per environment across five random seeds. Evaluation is conducted every 5k gradient steps for AWAC, CQL, and TD3+BC, and every 10k steps for IQL, using five rollouts per evaluation.