Operator World Models for Reinforcement Learning
Authors: Pietro Novelli, Marco Pratticò, Massimiliano Pontil, Carlo Ciliberto
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Preliminary experiments in finite and infinite state settings support the effectiveness of our method. [...] We empirically evaluated POWR on classical Gym environments [26], ranging from discrete (Frozen Lake-v1,Taxi-v3) to continuous state spaces (Mountain Car-v0). |
| Researcher Affiliation | Academia | Pietro Novelli Istituto Italiano di Tecnologia pietro.novelli@iit.it Marco Pratticò Istituto Italiano di Tecnologia marco.prattico@iit.it Massimiliano Pontil Istituto Italiano di Tecnologia AI Centre, University College London massimiliano.pontil@iit.it Carlo Ciliberto AI Centre, University College London c.ciliberto@ucl.ac.uk |
| Pseudocode | Yes | Algorithm 1 POWR: POLICY MIRROR DESCENT WITH OPERATOR WORLD-MODELS FOR RL |
| Open Source Code | Yes | Code available at: github.com/CSML-IIT-UCL/powr |
| Open Datasets | Yes | We empirically evaluated POWR on classical Gym environments [26], ranging from discrete (Frozen Lake-v1,Taxi-v3) to continuous state spaces (Mountain Car-v0). [26] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arxiv. ar Xiv preprint ar Xiv:1606.01540, 10, 2016. |
| Dataset Splits | No | The paper mentions "training runs" and "test environments" but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using "stable baselines library [48]" for baselines. However, it does not specify a version number for this library or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | No | The paper states, "We used the standard hyperparameters in [48]." (Appendix D.2). This refers to an external source for hyperparameters rather than listing them explicitly within the paper's text. |