Model-Augmented Actor-Critic: Backpropagating through Paths
Authors: Ignasi Clavera, Yao Fu, Pieter Abbeel
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation aims to examine the following questions: 1) How does MAAC compares against state-of-the-art model-based and model-free methods? ... In order to answer the posed questions, we evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019). |
| Researcher Affiliation | Academia | Anonymous authors Paper under double-blind review |
| Pseudocode | Yes | Algorithm 1 MAAC |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We evaluate our approach on model-based continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012; Wang et al., 2019). |
| Dataset Splits | Yes | The dynamics models are trained via maximum likelihood with early stopping on a validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions the 'Mu Jo Co simulator' but does not provide specific version numbers for it or any other software dependencies, libraries, or programming languages. |
| Experiment Setup | Yes | The horizon H in our proposed objective acts as a hyperparameter between model-free (when fully relying on the Q-function) and model-based (when using a longer horizon) of our algorithm. In practice, we train two Q-functions (Fujimoto et al., 2018) since it has been experimentally proven to yield better results. Algorithm 1 mentions βf, βπ, βQ as learning rates for the model, policy, and Q-function respectively. |