Iterative Amortized Policy Optimization
Authors: Joseph Marino, Alexandre Piche, Alessandro Davide Ialongo, Yisong Yue
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks. Accompanying code: github.com/joelouismarino/variational_rl. Also, the ethics statement: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] We demonstrate performance improvements and novel benefits of iterative amortization in Section 4. Each claim is supported by empirical evidence. |
| Researcher Affiliation | Collaboration | Joseph Marino California Institute of Technology Alexandre Piché Mila, Université de Montréal Alessandro Davide Ialongo University of Cambridge Yisong Yue California Institute of Technology. Now at Deep Mind, London, UK. Correspondence to josephmarino@deepmind.com. |
| Pseudocode | Yes | Algorithm 1 Direct Amortization; Algorithm 2 Iterative Amortization |
| Open Source Code | Yes | Accompanying code: github.com/joelouismarino/variational_rl. Also from the ethics statement: Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We have included code in the supplementary material with an accompanying README file. |
| Open Datasets | Yes | We evaluate iterative amortized policy optimization on the suite of Mu Jo Co [78] continuous control tasks from Open AI gym [12]. |
| Dataset Splits | Yes | From the ethics statement: Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] All details and hyperparameters are provided in the Appendix. We evaluate iterative amortized policy optimization on the suite of Mu Jo Co [78] continuous control tasks from Open AI gym [12]. |
| Hardware Specification | Yes | From the ethics statement: Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We provide these details in the supplementary material. |
| Software Dependencies | No | The paper mentions and cites several software packages and libraries (e.g., NumPy, PyTorch, MuJoCo, OpenAI Gym) that were used or are relevant to the work. However, it does not explicitly list specific version numbers for these software dependencies that would be required to reproduce the experimental environment. |
| Experiment Setup | Yes | From the ethics statement: Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] All details and hyperparameters are provided in the Appendix. |