reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Authors: Junyan Liu, Lillian J. Ratliff

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies the repeated principal-agent bandit game, where the principal indirectly explores an unknown environment by incentivizing an agent to play arms. We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon T, achieving regret bounds of e O(T) and e O(T 2/3), respectively.
Researcher Affiliation	Academia	1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA USA 2Electrical & Computer Engineering, University of Washington, Seattle, WA USA.
Pseudocode	Yes	Algorithm 1 Proposed algorithm for i.i.d. reward
Open Source Code	No	The paper does not contain any statement about code availability or links to code repositories.
Open Datasets	No	The paper describes a theoretical framework for principal-agent bandit games and does not use any specific dataset for empirical evaluation.
Dataset Splits	No	The paper focuses on theoretical algorithm design and regret bounds for bandit games, and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification	No	The paper presents theoretical algorithms and regret analysis for bandit games, without conducting empirical experiments that would require specific hardware specifications.
Software Dependencies	No	The paper focuses on theoretical algorithm design and provides mathematical analysis, without detailing specific software or library versions used for implementation or simulation.
Experiment Setup	No	The paper presents a theoretical study of principal-agent bandit games, focusing on algorithm design and regret bounds, and therefore does not describe any experimental setup or hyperparameters.