Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study

Authors: Cam Linke, Nadia M. Ady, Martha White, Thomas Degris, Adam White

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The paper is titled "Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study". It includes sections such as "7. Experimental Setup", "8. Experiment One: Drifter-Distractor", "9. Experiment Two: Switched Drifter-Distractor Problem", and "10. Experiment Three: Jumpy Eight-Action Problem", which describe conducting experiments, analyzing data, and reporting performance using figures and tables.
Researcher Affiliation Collaboration Authors Cam Linke, Nadia M. Ady, and Martha White are affiliated with the University of Alberta (academic). Authors Thomas Degris and Adam White are affiliated with DeepMind (industry), with Adam White also affiliated with the University of Alberta. This mix of academic and industry affiliations indicates a collaboration.
Pseudocode Yes The paper includes a clearly labeled algorithm block: "Algorithm 1 : The Autostep algorithm specialized to stateless prediction" in Section 4.
Open Source Code Yes The paper states: "To enable the reader to do this on their own, we have provided a python notebook to explore the full set of data, at http://jair.adaptingbehavior.com ."
Open Datasets Yes The paper describes generating custom datasets for its experiments, for which access is provided: "In this paper, we investigate and compare different intrinsic reward mechanisms in a new bandit-like parallel-learning testbed." and "To enable the reader to do this on their own, we have provided a python notebook to explore the full set of data, at http://jair.adaptingbehavior.com ."
Dataset Splits No The paper describes a synthetic, online learning environment where data is generated continuously over time steps for experiments (e.g., "Phase one lasts for 50,000 time steps, then targets are permuted and remain fixed for the remainder of the experiment (another 100,000 steps)"). It does not specify static training/validation/test splits for a fixed dataset, but rather the duration and phases of a simulation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It describes the experimental setup and algorithms but not the underlying computing resources.
Software Dependencies No The paper mentions using a "python notebook" and references algorithms like "Gradient Bandit" and "Dynamic Thompson Sampling" and "Autostep algorithm" but does not specify version numbers for Python or any particular libraries, frameworks, or solvers.
Experiment Setup Yes The paper provides extensive details on the experimental setup in Section 7 and Table 5, including specific hyperparameter configurations for the behavior agent (step-size α, average reward rate αr), prediction learners (step-size αp, meta learning-rate κ), and intrinsic rewards (smoothing parameter β, window lengths η and τ).