Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Interactive World Model for Object-Centric Reinforcement Learning

Authors: Fan Feng, Phillip Lippe, Sara Magliacane

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of our proposed interactive world model and policy learning framework, we aim to address the following questions: (i) How accurately does the model learns the state disentanglement and interaction models? (ii) How well does it perform in long-horizon task learning? and (iii) How well does the framework achieve compositional generalization? To answer these questions, we evaluate our method on a range of simulated control, robotic manipulation, and embodied AI benchmarks, including Sprites World [77], Open AI-Gym Fetch [78], i Gibson [79], and Libero [80].
Researcher Affiliation Academia Fan Feng1,2 Phillip Lippe3 Sara Magliacane3 1 University of California San Diego 2 Mohamed bin Zayed University of Artificial Intelligence 3 University of Amsterdam
Pseudocode Yes Algorithm 1 FIOC-WM: Offline World Model and Online Policy Learning (Simplified)
Open Source Code No The code and data will be publicly available after acceptance.
Open Datasets Yes To answer these questions, we evaluate our method on a range of simulated control, robotic manipulation, and embodied AI benchmarks, including Sprites World [77], Open AI-Gym Fetch [78], i Gibson [79], and Libero [80].
Dataset Splits No The paper describes data collection (e.g., "collect 3000 episodes with random actions") and training steps, but does not provide explicit dataset splits (e.g., percentages or counts for training, validation, and test sets) from a single collected dataset.
Hardware Specification Yes Compute used for training the FIOC-WM: For Sprites-World, we use 3 hours on 1x NVIDIA A100; For Fetch, we use 8 hours on 6x NVIDIA 4090; For i-Gibson, we use 9 hours on 6x NVIDIA 4090; For Libero-object, we use 8 hours on 1x NVIDIA A100; For Kitchen, we use 6 hours on 1x NVIDIA A100.
Software Dependencies No The paper mentions models like DINO-v2 and R3M and framework components like MLP and GRU layers, but it does not specify version numbers for any underlying software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes For the VAE used to learn latent states, we employ a two-layer MLP with a hidden size of 256. The specific hyperparameters for different environments are detailed in Table A7. The hyperparameters for the loss terms are set as {α, β, γ, η} = {1, 0.05, 0.1, 0.2}, and the learning rate is set to 3 × 10−4. For MPC, we use gradient descent with a learning rate of 5 × 10−5. For those using PPO, we set the learning rate to 3 × 10−4 with a clip ratio of 0.1. The MLP architecture consists of hidden sizes [256, 256] for Gym-Fetch, while for other environments, we use [512, 512]. Generalized Advantage Estimation (GAE) is set to 0.95 for all environments, and the entropy coefficient is 0.1.