Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

d3rlpy: An Offline Deep Reinforcement Learning Library

Authors: Takuma Seno, Michita Imai

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results.
Researcher Affiliation	Collaboration	Takuma Seno EMAIL Keio University Kanagawa, Japan Sony AI Tokyo, Japan Michita Imai EMAIL Keio University Kanagawa, Japan
Pseudocode	No	The paper includes a 'Library interface' section with Python code examples showing how to use the d3rlpy library, but it does not contain formal pseudocode or algorithm blocks describing the underlying algorithms implemented within the library.
Open Source Code	Yes	The d3rlpy source code can be found on Git Hub: https://github.com/takuseno/d3rlpy. The full Python scripts used in this benchmark are also included in our source code 2, which allows users to conduct additional benchmark experiments. 2. https://github.com/takuseno/d3rlpy/tree/master/reproductions
Open Datasets	Yes	To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. The popular benchmark datasets such as D4RL and Atari 2600 datasets are also provided by d3rlpy.datasets package that converts them into MDPDataset object.
Dataset Splits	Yes	We used 1% portion of transitions (500K datapoints) and train each algorithm for 12.5M gradient steps and evaluate every 125K steps to collect evaluation performance in environments for 10 episodes.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'use_gpu=0' in a code example, which is a software parameter rather than a hardware specification for the experimental setup.
Software Dependencies	No	d3rlpy provides a set of oﬀ-policy oﬄine and online RL algorithms built with Py Torch (Paszke et al., 2019). The paper mentions Python, PyTorch, scikit-learn-styled API, and the Adam optimizer, but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Table 1 shows hyperparameters used in benchmarking. We used the same hyperparameters as the ones previously reported in previous papers or recommended in author-provided repositories. We used discount factor of 0.99, target update rate of 5e-3 and an Adam optimizer (Kingma and Ba, 2014) across all algorithms. The default architecture was MLP with hidden layers of [256, 256] unless we explicitly address it. We repeated all experiments with 10 random seeds.