Model-Augmented Actor-Critic: Backpropagating through Paths

Authors: Ignasi Clavera, Yao Fu, Pieter Abbeel

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation aims to examine the following questions: 1) How does MAAC compares against state-of-the-art model-based and model-free methods? ... In order to answer the posed questions, we evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019).
Researcher Affiliation Academia Anonymous authors Paper under double-blind review
Pseudocode Yes Algorithm 1 MAAC
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets Yes We evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019).
Dataset Splits Yes The dynamics models are trained via maximum likelihood with early stopping on a validation set.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments.
Software Dependencies No The paper mentions the 'Mu Jo Co simulator' but does not provide specific version numbers for it or any other software dependencies, libraries, or programming languages.
Experiment Setup Yes The horizon H in our proposed objective acts as a hyperparameter between model-free (when fully relying on the Q-function) and model-based (when using a longer horizon) of our algorithm. In practice, we train two Q-functions (Fujimoto et al., 2018) since it has been experimentally proven to yield better results. Algorithm 1 mentions βf, βπ, βQ as learning rates for the model, policy, and Q-function respectively.