reproducibilityindex.ai

Learning Value Functions in Deep Policy Gradients using Residual Variance

Authors: Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, Philippe Preux

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.
Researcher Affiliation	Academia	Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Reda Ouhamma Inria, Scool team Univ. Lille, CRISt AL, CNRS reda.ouhamma@inria.fr Odalric-Ambrym Maillard Inria, Scool team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS
Pseudocode	Yes	Algorithm 1 AVEC coupled with PPO or TRPO. Algorithm 2 AVEC coupled with SAC.
Open Source Code	No	The paper does not contain any statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets	Yes	For ease of comparison with other methods, we evaluate AVEC on the Mu Jo Co (Todorov et al., 2012) and the Py Bullet (Coumans & Bai, 2016) continuous control benchmarks (see Appendix G for details) using Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper describes training and evaluation on continuous control environments for a certain number of timesteps. It does not provide explicit training, validation, and test dataset splits in the traditional sense for a fixed dataset.
Hardware Specification	No	The paper does not specify any particular CPU, GPU, or other hardware used for running the experiments.
Software Dependencies	No	The paper mentions using Open AI Gym and specific RL algorithms (PPO, TRPO, SAC) but does not list specific software names with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We provide the list of hyperparameters and further implementation details in Appendix D and E. In Table 2, 3 and 4, we report the list of hyperparameters common to all continuous control experiments.