Policy Optimization via Importance Sampling

Authors: Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.
Researcher Affiliation Academia Alberto Maria Metelli Politecnico di Milano, Milan, Italy albertomaria.metelli@polimi.it; Matteo Papini Politecnico di Milano, Milan, Italy matteo.papini@polimi.it; Francesco Faccio Politecnico di Milano, Milan, Italy IDSIA, USI-SUPSI, Lugano, Switzerland francesco.faccio@mail.polimi.it; Marcello Restelli Politecnico di Milano, Milan, Italy marcello.restelli@polimi.it
Pseudocode Yes The pseudo-code of POIS is reported in Algorithm 1. (Also Algorithm 2)
Open Source Code Yes The implementation of POIS can be found at https://github.com/T3p/pois.
Open Datasets Yes ...on classical control tasks [12, 57]. (Reference [12] is "Benchmarking deep reinforcement learning for continuous control" which uses standard environments.)
Dataset Splits No The paper describes using a "current policy" to collect trajectories for optimization, and performing "offline optimization". It does not explicitly mention fixed training, validation, or test dataset splits with percentages or counts, as is common in supervised learning contexts.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40cm, Titan XP and Tesla V100 used for this research.
Software Dependencies No The paper does not specify versions for any software dependencies, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes All experimental details are provided in Appendix F. (Appendix F.1 mentions: "For linear policies we used a learning rate α = 0.001 and a batch size N = 20 trajectories." Appendix F.2 mentions: "We adopted the same network architecture for all environments: 3 layers: 100, 50, 25 neurons each.")