Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Open-World Reinforcement Learning over Long Short-Term Imagination

Authors: Jiajian Li, Qi Wang, Yunbo Wang, Xin Jin, Yang Li, Wenjun Zeng, Xiaokang Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in the challenging open-world tasks from Mine Dojo (Fan et al., 2022). LS-Imagine demonstrates superior performance compared to existing visual RL methods. We evaluate all the Minecraft agents in terms of success rate shown in Figure 4 and per-episode steps shown in Figure 5. We find that LS-Imagine significantly outperforms the compared models...
Researcher Affiliation	Academia	1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2 Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 3 School of Computer Science and Technology, East China Normal University, Shanghai, China
Pseudocode	Yes	Algorithm 1 The training pipeline of LS-Imagine.
Open Source Code	Yes	We provide the source code at https://github.com/qiwang067/LS-Imagine.
Open Datasets	Yes	We explore LS-Imagine on the challenging Mine Dojo (Fan et al., 2022) benchmark on top of the popular Minecraft game, which is a comprehensive simulation platform with various open-ended tasks. All environments and datasets used are synthetic and publicly available.
Dataset Splits	No	The paper mentions collecting 2,000 images for finetuning a U-Net component: "To generate accurate affordance maps, we collect 2,000 images from the environment using a random agent under the current task instruction and generate a discrete set of (ot, I, Mot,I), which are then used to finetune the multimodal U-Net for 200 epochs." However, for the main experiments on Mine Dojo tasks, it does not explicitly state dataset splits (e.g., train/test/validation percentages or counts) but rather refers to tasks in a benchmark environment, which typically have predefined evaluation protocols rather than explicit dataset splits by the authors.
Hardware Specification	Yes	Each training of LS-Imagine takes approximately 23 GB of VRAM and requires around 1.7 days to complete on a single RTX 3090 GPU.
Software Dependencies	No	The paper mentions using specific software tools like 'Mine CLIP' and 'Swin-Unet' and 'Dreamer V3' as a base, and references a 'multimodal U-Net', but it does not specify any version numbers for these software components or for general programming languages/libraries like Python or PyTorch.
Experiment Setup	Yes	Implementation details. We conduct our experiments on the Mine Dojo environment, where both visual observation and corresponding affordance maps are resized to 64 64 pixels. ...For tasks in the Mine Dojo benchmark, we train the agent for 1 × 10^6 environment steps. ...The final hyperparameters of LS-Imagine are shown in Table 4.