Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
d3rlpy: An Offline Deep Reinforcement Learning Library
Authors: Takuma Seno, Michita Imai
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. |
| Researcher Affiliation | Collaboration | Takuma Seno EMAIL Keio University Kanagawa, Japan Sony AI Tokyo, Japan Michita Imai EMAIL Keio University Kanagawa, Japan |
| Pseudocode | No | The paper includes a 'Library interface' section with Python code examples showing how to use the d3rlpy library, but it does not contain formal pseudocode or algorithm blocks describing the underlying algorithms implemented within the library. |
| Open Source Code | Yes | The d3rlpy source code can be found on Git Hub: https://github.com/takuseno/d3rlpy. The full Python scripts used in this benchmark are also included in our source code 2, which allows users to conduct additional benchmark experiments. 2. https://github.com/takuseno/d3rlpy/tree/master/reproductions |
| Open Datasets | Yes | To address a reproducibility issue, we conduct a large-scale benchmark with D4RL and Atari 2600 dataset to ensure implementation quality and provide experimental scripts and full tables of results. The popular benchmark datasets such as D4RL and Atari 2600 datasets are also provided by d3rlpy.datasets package that converts them into MDPDataset object. |
| Dataset Splits | Yes | We used 1% portion of transitions (500K datapoints) and train each algorithm for 12.5M gradient steps and evaluate every 125K steps to collect evaluation performance in environments for 10 episodes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions 'use_gpu=0' in a code example, which is a software parameter rather than a hardware specification for the experimental setup. |
| Software Dependencies | No | d3rlpy provides a set of oο¬-policy oο¬ine and online RL algorithms built with Py Torch (Paszke et al., 2019). The paper mentions Python, PyTorch, scikit-learn-styled API, and the Adam optimizer, but does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Table 1 shows hyperparameters used in benchmarking. We used the same hyperparameters as the ones previously reported in previous papers or recommended in author-provided repositories. We used discount factor of 0.99, target update rate of 5e-3 and an Adam optimizer (Kingma and Ba, 2014) across all algorithms. The default architecture was MLP with hidden layers of [256, 256] unless we explicitly address it. We repeated all experiments with 10 random seeds. |