Operator World Models for Reinforcement Learning

Authors: Pietro Novelli, Marco Pratticò, Massimiliano Pontil, Carlo Ciliberto

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Preliminary experiments in finite and infinite state settings support the effectiveness of our method. [...] We empirically evaluated POWR on classical Gym environments [26], ranging from discrete (Frozen Lake-v1,Taxi-v3) to continuous state spaces (Mountain Car-v0).
Researcher Affiliation Academia Pietro Novelli Istituto Italiano di Tecnologia pietro.novelli@iit.it Marco Pratticò Istituto Italiano di Tecnologia marco.prattico@iit.it Massimiliano Pontil Istituto Italiano di Tecnologia AI Centre, University College London massimiliano.pontil@iit.it Carlo Ciliberto AI Centre, University College London c.ciliberto@ucl.ac.uk
Pseudocode Yes Algorithm 1 POWR: POLICY MIRROR DESCENT WITH OPERATOR WORLD-MODELS FOR RL
Open Source Code Yes Code available at: github.com/CSML-IIT-UCL/powr
Open Datasets Yes We empirically evaluated POWR on classical Gym environments [26], ranging from discrete (Frozen Lake-v1,Taxi-v3) to continuous state spaces (Mountain Car-v0). [26] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arxiv. ar Xiv preprint ar Xiv:1606.01540, 10, 2016.
Dataset Splits No The paper mentions "training runs" and "test environments" but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions using "stable baselines library [48]" for baselines. However, it does not specify a version number for this library or any other software dependencies, which is required for reproducibility.
Experiment Setup No The paper states, "We used the standard hyperparameters in [48]." (Appendix D.2). This refers to an external source for hyperparameters rather than listing them explicitly within the paper's text.