reproducibilityindex.ai

Deep Conservative Policy Iteration

Authors: Nino Vieillard, Olivier Pietquin, Matthieu Geist6070-6077

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment thoroughly the resulting algorithm on the simple Cartpole problem, and validate the proposed method on a representative subset of Atari games.
Researcher Affiliation	Industry	Google Research, Brain Team
Pseudocode	Yes	Algorithm 1 DCPI
Open Source Code	No	The paper does not provide any concrete access (link, explicit statement of release) to the source code for the methodology it describes.
Open Datasets	Yes	We use the version of Cartpole implemented in Open AI Gym (Brockman et al. 2016)... We used the DQN implementation from the Dopamine library as our baseline... Atari is a challenging discrete-actions control environment, introduced by Bellemare et al. (2013) consisting of 57 games.
Dataset Splits	No	The paper discusses training steps and evaluation on the environments (Cartpole, Atari), but it does not explicitly provide details about a separate validation dataset split from a static dataset, or how such a split would be generated or used for hyperparameter tuning separate from testing.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, cloud instance types).
Software Dependencies	No	The paper mentions using the "Dopamine library (Castro et al. 2018)" and "Open AI Gym (Brockman et al. 2016)" but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	Notably, we used the same network architecture for the q-network and the policy network and two identical Adam optimizers; we compute a gradient step every F = 4 interactions with the environment, and update the target networks every C = 100 interactions. Full parameters are reported in the Appendix. ... we chose β1 = 0.99... β2 = 0.9999. ... After a small hyperparameter search on a few games (Pong, Asterix and Space Invaders), we chose α0 = 1 and the Adamax mixture rate (see Eq. (8)).