Incentivized Bandit Learning with Self-Reinforcing User Preferences
Authors: Tianchen Zhou, Jia Liu, Chaosheng Dong, Jingyuan Deng
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct numerical simulations to demonstrate and verify the performances of these two policies and study their robustness under various settings. |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA 2Amazon, Seattle, Washington, USA. |
| Pseudocode | Yes | Policy 1: At-Least-n Explore-Then-Commit ... Policy 2: The UCB-List Policy |
| Open Source Code | No | The paper does not provide any statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | No | The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset. It sets up simulation parameters like "a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1]" but does not refer to a publicly accessible dataset. |
| Dataset Splits | No | The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset with explicit train/validation/test splits. |
| Hardware Specification | No | The paper mentions "numerical simulations" but does not provide any specific details about the hardware used to run these simulations (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the simulations or implementation. |
| Experiment Setup | Yes | The simulation setting is as follows: a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1], the feedback function F(x) = xα with α = 1.5 and payment b = 1.5 with an incentive impact function G(x, t) = x. We use the optimal ALn ETC parameter q = 15. |