Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Incentivized Learning in Principal-Agent Bandit Games
Authors: Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, Eric Moulines, Michael Jordan, El-Mahdi El-Mhamdi, Alain Oliviero Durmus
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we support our theoretical guarantees through numerical experiments. |
| Researcher Affiliation | Academia | 1Centre de Math ematiques Appliqu ees CNRS Ecole polytechnique Institut Polytechnique de Paris route de Saclay 91128 Palaiseau cedex 2Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay, 91405, Orsay, France 3INRIA, Universite Paris Saclay, LMO, Orsay, France 4University of California, Berkeley 5Inria, Ecole Normale Sup erieure, PSL Research University. |
| Pseudocode | Yes | Algorithm 1 IPA Algorithm 2 Contextual IPA Algorithm 3 Binary Search Subroutine Algorithm 4 UCB Subroutine Algorithm 5 Projected Volume |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | No | We ran the experiments in Figure 2 for a horizon T = 10 000 on an average of 100 runs on a five arms bandit. We plotted the standard error across the different runs. The expected rewards for the principal (θ) and the agent (s) are given in Table 3. The principal s rewards Xa(t) are drawn from an i.i.d. distribution Xa(t) N(θa, 1) for any a [K], t [T]. The paper describes how the data for the 'toy example' experiments was generated, including specific parameters in Table 3, but does not provide a link or formal citation to a publicly available dataset. |
| Dataset Splits | No | The paper does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions comparing with 'Principal s ε-Greedy algorithm of Dogan et al. (2023b)' and using 'UCB instance' but does not specify any software names with version numbers (e.g., Python version, specific libraries or frameworks). |
| Experiment Setup | Yes | We ran the experiments in Figure 2 for a horizon T = 10 000 on an average of 100 runs on a five arms bandit. The expected rewards for the principal (θ) and the agent (s) are given in Table 3. The principal s rewards Xa(t) are drawn from an i.i.d. distribution Xa(t) N(θa, 1) for any a [K], t [T]. For the Principal s ε-Greedy algorithm, we use the hyperparameters α = 1 and m = 500. |