Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models

Authors: Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin J. Talvitie, Michael Bowling, Martha White

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results provide evidence for the HVH, and suggest that using predecessor models with multi-step updates is a promising direction toward developing Dyna algorithms that are more robust to model error. We introduce an environment to test the hypothesis and show that previous variants of Dyna fail when the model is imperfect whereas our algorithm does not. We further test the algorithms on three classic benchmark environments and ﬁnd even in these environments the same behavior persists.
Researcher Affiliation	Academia	Farzane Aminmansour EMAIL Taher Jaﬀerjee EMAIL Ehsan Imani EMAIL Dept of Computing Science & the Alberta Machine Intelligence Inst University of Alberta, Canada Erin J. Talvitie EMAIL Dept of Computer Science, Harvey Mudd College, USA Michael Bowling EMAIL Martha White EMAIL Dept of Computing Science & Amii University of Alberta, Canada
Pseudocode	Yes	Algorithm 1 Original Dyna-Q ... Algorithm 2 Prioritised-Dyna with Multi-step Updates ... Algorithm 3 Planning Update ... Algorithm 4 Pop Tuple and Screen ... Algorithm 5 Is On Policy
Open Source Code	No	The paper mentions "Pygame Learning Environment. https://github.com/ntasfi/Py Game-Learning-Environment." but this is a third-party tool used for experiments, not the authors' own source code for their methodology. There is no explicit statement or link provided by the authors for their own code release.
Open Datasets	Yes	Our experiments were conducted on three benchmarks: Cartpole (Brockman, Cheung, Pettersson, Schneider, Schulman, Tang, & Zaremba, 2016), Puddleworld (Degris, White, & Sutton, 2012), and Catcher (Tasﬁ, 2016).
Dataset Splits	No	To learn the oﬄine model, following the method of (Oh, Guo, Lee, Lewis, & Singh, 2015), we collected 100, 000 training samples by executing a pre-trained agent on the environment with ϵ = 0.5. This describes data collection for the model, but not explicit train/test/validation splits for the main experiments or model evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using Q-learning and DQN algorithms, as well as the Pygame Learning Environment, but it does not specify any version numbers for these or other software libraries/dependencies.
Experiment Setup	Yes	The value of α has been selected by sweeping over a set of {0.1, 0.25, 0.5, 0.75, 0.05, 0.125} and β by sweeping over {0.0, 0.15, 0.33, 0.50, 0.66, 0.75, 0.90, 1.0}. All agents use N = 1 planning updates per step, where each planning update iterates over all actions. We trained a network with 200 hidden units to convergence using the DQN algorithm and froze its weights. We initialised weights of the linear learner using samples from N(0, 1).