Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study

Authors: Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The paper is titled "Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study". It includes sections such as "7. Experimental Setup", "8. Experiment One: Drifter-Distractor", "9. Experiment Two: Switched Drifter-Distractor Problem", and "10. Experiment Three: Jumpy Eight-Action Problem", which describe conducting experiments, analyzing data, and reporting performance using figures and tables.
Researcher Affiliation	Collaboration	Authors Cam Linke, Nadia M. Ady, and Martha White are affiliated with the University of Alberta (academic). Authors Thomas Degris and Adam White are affiliated with DeepMind (industry), with Adam White also affiliated with the University of Alberta. This mix of academic and industry affiliations indicates a collaboration.
Pseudocode	Yes	The paper includes a clearly labeled algorithm block: "Algorithm 1 : The Autostep algorithm specialized to stateless prediction" in Section 4.
Open Source Code	Yes	The paper states: "To enable the reader to do this on their own, we have provided a python notebook to explore the full set of data, at http://jair.adaptingbehavior.com ."
Open Datasets	Yes	The paper describes generating custom datasets for its experiments, for which access is provided: "In this paper, we investigate and compare diﬀerent intrinsic reward mechanisms in a new bandit-like parallel-learning testbed." and "To enable the reader to do this on their own, we have provided a python notebook to explore the full set of data, at http://jair.adaptingbehavior.com ."
Dataset Splits	No	The paper describes a synthetic, online learning environment where data is generated continuously over time steps for experiments (e.g., "Phase one lasts for 50,000 time steps, then targets are permuted and remain ﬁxed for the remainder of the experiment (another 100,000 steps)"). It does not specify static training/validation/test splits for a fixed dataset, but rather the duration and phases of a simulation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It describes the experimental setup and algorithms but not the underlying computing resources.
Software Dependencies	No	The paper mentions using a "python notebook" and references algorithms like "Gradient Bandit" and "Dynamic Thompson Sampling" and "Autostep algorithm" but does not specify version numbers for Python or any particular libraries, frameworks, or solvers.
Experiment Setup	Yes	The paper provides extensive details on the experimental setup in Section 7 and Table 5, including specific hyperparameter configurations for the behavior agent (step-size α, average reward rate αr), prediction learners (step-size αp, meta learning-rate κ), and intrinsic rewards (smoothing parameter β, window lengths η and τ).