Deterministic Value-Policy Gradients
Authors: Qingpeng Cai, Ling Pan, Pingzhong Tang3316-3323
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines. |
| Researcher Affiliation | Collaboration | 1Alibaba Group 2IIIS, Tsinghua University |
| Pseudocode | Yes | Algorithm 1 The DVG(k) algorithm |
| Open Source Code | No | The paper does not provide an explicit statement or a link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate DVPG in a number of continuous control benchmark tasks in Open AI Gym based on the Mu Jo Co simulator. |
| Dataset Splits | No | The paper describes training with minibatches and evaluates on continuous control benchmarks but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) or cite standard splits used for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym based on the Mu Jo Co simulator' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We evaluate the effect of the discount factor on DVG... with different values from 0.6 to 0.99. We evaluate the effect of the weight of bootstrapping on DVPG with different values from 0.1 to 0.9, where the number of rollout steps is set to be 4. We evaluate the effect of the number of rollout steps ranging from 1 to 5. |