Incentivized Learning in Principal-Agent Bandit Games
Authors: Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, Eric Moulines, Michael Jordan, El-Mahdi El-Mhamdi, Alain Oliviero Durmus
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we support our theoretical guarantees through numerical experiments. |
| Researcher Affiliation | Academia | 1Centre de Math ematiques Appliqu ees CNRS Ecole polytechnique Institut Polytechnique de Paris route de Saclay 91128 Palaiseau cedex 2Universit e Paris-Saclay, CNRS, Laboratoire de math ematiques d Orsay, 91405, Orsay, France 3INRIA, Universite Paris Saclay, LMO, Orsay, France 4University of California, Berkeley 5Inria, Ecole Normale Sup erieure, PSL Research University. |
| Pseudocode | Yes | Algorithm 1 IPA Algorithm 2 Contextual IPA Algorithm 3 Binary Search Subroutine Algorithm 4 UCB Subroutine Algorithm 5 Projected Volume |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | No | We ran the experiments in Figure 2 for a horizon T = 10 000 on an average of 100 runs on a five arms bandit. We plotted the standard error across the different runs. The expected rewards for the principal (θ) and the agent (s) are given in Table 3. The principal s rewards Xa(t) are drawn from an i.i.d. distribution Xa(t) N(θa, 1) for any a [K], t [T]. The paper describes how the data for the 'toy example' experiments was generated, including specific parameters in Table 3, but does not provide a link or formal citation to a publicly available dataset. |
| Dataset Splits | No | The paper does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions comparing with 'Principal s ε-Greedy algorithm of Dogan et al. (2023b)' and using 'UCB instance' but does not specify any software names with version numbers (e.g., Python version, specific libraries or frameworks). |
| Experiment Setup | Yes | We ran the experiments in Figure 2 for a horizon T = 10 000 on an average of 100 runs on a five arms bandit. The expected rewards for the principal (θ) and the agent (s) are given in Table 3. The principal s rewards Xa(t) are drawn from an i.i.d. distribution Xa(t) N(θa, 1) for any a [K], t [T]. For the Principal s ε-Greedy algorithm, we use the hyperparameters α = 1 and m = 500. |