Learning Value Functions in Deep Policy Gradients using Residual Variance

Authors: Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, Philippe Preux

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.
Researcher Affiliation Academia Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Reda Ouhamma Inria, Scool team Univ. Lille, CRISt AL, CNRS reda.ouhamma@inria.fr Odalric-Ambrym Maillard Inria, Scool team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS
Pseudocode Yes Algorithm 1 AVEC coupled with PPO or TRPO. Algorithm 2 AVEC coupled with SAC.
Open Source Code No The paper does not contain any statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets Yes For ease of comparison with other methods, we evaluate AVEC on the Mu Jo Co (Todorov et al., 2012) and the Py Bullet (Coumans & Bai, 2016) continuous control benchmarks (see Appendix G for details) using Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper describes training and evaluation on continuous control environments for a certain number of timesteps. It does not provide explicit training, validation, and test dataset splits in the traditional sense for a fixed dataset.
Hardware Specification No The paper does not specify any particular CPU, GPU, or other hardware used for running the experiments.
Software Dependencies No The paper mentions using Open AI Gym and specific RL algorithms (PPO, TRPO, SAC) but does not list specific software names with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We provide the list of hyperparameters and further implementation details in Appendix D and E. In Table 2, 3 and 4, we report the list of hyperparameters common to all continuous control experiments.