Learning Value Functions in Deep Policy Gradients using Residual Variance
Authors: Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, Philippe Preux
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights. |
| Researcher Affiliation | Academia | Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Reda Ouhamma Inria, Scool team Univ. Lille, CRISt AL, CNRS reda.ouhamma@inria.fr Odalric-Ambrym Maillard Inria, Scool team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS |
| Pseudocode | Yes | Algorithm 1 AVEC coupled with PPO or TRPO. Algorithm 2 AVEC coupled with SAC. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the methodology or provide a link to a code repository. |
| Open Datasets | Yes | For ease of comparison with other methods, we evaluate AVEC on the Mu Jo Co (Todorov et al., 2012) and the Py Bullet (Coumans & Bai, 2016) continuous control benchmarks (see Appendix G for details) using Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes training and evaluation on continuous control environments for a certain number of timesteps. It does not provide explicit training, validation, and test dataset splits in the traditional sense for a fixed dataset. |
| Hardware Specification | No | The paper does not specify any particular CPU, GPU, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using Open AI Gym and specific RL algorithms (PPO, TRPO, SAC) but does not list specific software names with version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We provide the list of hyperparameters and further implementation details in Appendix D and E. In Table 2, 3 and 4, we report the list of hyperparameters common to all continuous control experiments. |