Iterative Amortized Policy Optimization

Authors: Joseph Marino, Alexandre Piche, Alessandro Davide Ialongo, Yisong Yue

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks. Accompanying code: github.com/joelouismarino/variational_rl. Also, the ethics statement: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] We demonstrate performance improvements and novel benefits of iterative amortization in Section 4. Each claim is supported by empirical evidence.
Researcher Affiliation Collaboration Joseph Marino California Institute of Technology Alexandre Piché Mila, Université de Montréal Alessandro Davide Ialongo University of Cambridge Yisong Yue California Institute of Technology. Now at Deep Mind, London, UK. Correspondence to josephmarino@deepmind.com.
Pseudocode Yes Algorithm 1 Direct Amortization; Algorithm 2 Iterative Amortization
Open Source Code Yes Accompanying code: github.com/joelouismarino/variational_rl. Also from the ethics statement: Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We have included code in the supplementary material with an accompanying README file.
Open Datasets Yes We evaluate iterative amortized policy optimization on the suite of Mu Jo Co [78] continuous control tasks from Open AI gym [12].
Dataset Splits Yes From the ethics statement: Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] All details and hyperparameters are provided in the Appendix. We evaluate iterative amortized policy optimization on the suite of Mu Jo Co [78] continuous control tasks from Open AI gym [12].
Hardware Specification Yes From the ethics statement: Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] We provide these details in the supplementary material.
Software Dependencies No The paper mentions and cites several software packages and libraries (e.g., NumPy, PyTorch, MuJoCo, OpenAI Gym) that were used or are relevant to the work. However, it does not explicitly list specific version numbers for these software dependencies that would be required to reproduce the experimental environment.
Experiment Setup Yes From the ethics statement: Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] All details and hyperparameters are provided in the Appendix.