Deterministic Value-Policy Gradients

Authors: Qingpeng Cai, Ling Pan, Pingzhong Tang3316-3323

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.
Researcher Affiliation Collaboration 1Alibaba Group 2IIIS, Tsinghua University
Pseudocode Yes Algorithm 1 The DVG(k) algorithm
Open Source Code No The paper does not provide an explicit statement or a link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluate DVPG in a number of continuous control benchmark tasks in Open AI Gym based on the Mu Jo Co simulator.
Dataset Splits No The paper describes training with minibatches and evaluates on continuous control benchmarks but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) or cite standard splits used for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Open AI Gym based on the Mu Jo Co simulator' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We evaluate the effect of the discount factor on DVG... with different values from 0.6 to 0.99. We evaluate the effect of the weight of bootstrapping on DVPG with different values from 0.1 to 0.9, where the number of rollout steps is set to be 4. We evaluate the effect of the number of rollout steps ranging from 1 to 5.