reproducibilityindex.ai

Deterministic Value-Policy Gradients

Authors: Qingpeng Cai, Ling Pan, Pingzhong Tang3316-3323

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.
Researcher Affiliation	Collaboration	1Alibaba Group 2IIIS, Tsinghua University
Pseudocode	Yes	Algorithm 1 The DVG(k) algorithm
Open Source Code	No	The paper does not provide an explicit statement or a link indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluate DVPG in a number of continuous control benchmark tasks in Open AI Gym based on the Mu Jo Co simulator.
Dataset Splits	No	The paper describes training with minibatches and evaluates on continuous control benchmarks but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) or cite standard splits used for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Open AI Gym based on the Mu Jo Co simulator' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We evaluate the effect of the discount factor on DVG... with different values from 0.6 to 0.99. We evaluate the effect of the weight of bootstrapping on DVPG with different values from 0.1 to 0.9, where the number of rollout steps is set to be 4. We evaluate the effect of the number of rollout steps ranging from 1 to 5.