reproducibilityindex.ai

Model-Augmented Actor-Critic: Backpropagating through Paths

Authors: Ignasi Clavera, Yao Fu, Pieter Abbeel

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation aims to examine the following questions: 1) How does MAAC compares against state-of-the-art model-based and model-free methods? ... In order to answer the posed questions, we evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019).
Researcher Affiliation	Academia	Anonymous authors Paper under double-blind review
Pseudocode	Yes	Algorithm 1 MAAC
Open Source Code	No	The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019).
Dataset Splits	Yes	The dynamics models are trained via maximum likelihood with early stopping on a validation set.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions the 'Mu Jo Co simulator' but does not provide specific version numbers for it or any other software dependencies, libraries, or programming languages.
Experiment Setup	Yes	The horizon H in our proposed objective acts as a hyperparameter between model-free (when fully relying on the Q-function) and model-based (when using a longer horizon) of our algorithm. In practice, we train two Q-functions (Fujimoto et al., 2018) since it has been experimentally proven to yield better results. Algorithm 1 mentions βf, βπ, βQ as learning rates for the model, policy, and Q-function respectively.