reproducibilityindex.ai

Policy Optimization via Importance Sampling

Authors: Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.
Researcher Affiliation	Academia	Alberto Maria Metelli Politecnico di Milano, Milan, Italy albertomaria.metelli@polimi.it; Matteo Papini Politecnico di Milano, Milan, Italy matteo.papini@polimi.it; Francesco Faccio Politecnico di Milano, Milan, Italy IDSIA, USI-SUPSI, Lugano, Switzerland francesco.faccio@mail.polimi.it; Marcello Restelli Politecnico di Milano, Milan, Italy marcello.restelli@polimi.it
Pseudocode	Yes	The pseudo-code of POIS is reported in Algorithm 1. (Also Algorithm 2)
Open Source Code	Yes	The implementation of POIS can be found at https://github.com/T3p/pois.
Open Datasets	Yes	...on classical control tasks [12, 57]. (Reference [12] is "Benchmarking deep reinforcement learning for continuous control" which uses standard environments.)
Dataset Splits	No	The paper describes using a "current policy" to collect trajectories for optimization, and performing "ofﬂine optimization". It does not explicitly mention fixed training, validation, or test dataset splits with percentages or counts, as is common in supervised learning contexts.
Hardware Specification	Yes	We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40cm, Titan XP and Tesla V100 used for this research.
Software Dependencies	No	The paper does not specify versions for any software dependencies, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	All experimental details are provided in Appendix F. (Appendix F.1 mentions: "For linear policies we used a learning rate α = 0.001 and a batch size N = 20 trajectories." Appendix F.2 mentions: "We adopted the same network architecture for all environments: 3 layers: 100, 50, 25 neurons each.")