Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Value Functions in Deep Policy Gradients using Residual Variance

Authors: Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, Philippe Preux

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights.
Researcher Affiliation Academia Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS EMAIL Reda Ouhamma Inria, Scool team Univ. Lille, CRISt AL, CNRS EMAIL Odalric-Ambrym Maillard Inria, Scool team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS
Pseudocode Yes Algorithm 1 AVEC coupled with PPO or TRPO. Algorithm 2 AVEC coupled with SAC.
Open Source Code No The paper does not contain any statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets Yes For ease of comparison with other methods, we evaluate AVEC on the Mu Jo Co (Todorov et al., 2012) and the Py Bullet (Coumans & Bai, 2016) continuous control benchmarks (see Appendix G for details) using Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper describes training and evaluation on continuous control environments for a certain number of timesteps. It does not provide explicit training, validation, and test dataset splits in the traditional sense for a fixed dataset.
Hardware Specification No The paper does not specify any particular CPU, GPU, or other hardware used for running the experiments.
Software Dependencies No The paper mentions using Open AI Gym and specific RL algorithms (PPO, TRPO, SAC) but does not list specific software names with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We provide the list of hyperparameters and further implementation details in Appendix D and E. In Table 2, 3 and 4, we report the list of hyperparameters common to all continuous control experiments.