Incentivized Bandit Learning with Self-Reinforcing User Preferences

Authors: Tianchen Zhou, Jia Liu, Chaosheng Dong, Jingyuan Deng

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct numerical simulations to demonstrate and verify the performances of these two policies and study their robustness under various settings.
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA 2Amazon, Seattle, Washington, USA.
Pseudocode Yes Policy 1: At-Least-n Explore-Then-Commit ... Policy 2: The UCB-List Policy
Open Source Code No The paper does not provide any statements or links indicating that the source code for the methodology is openly available.
Open Datasets No The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset. It sets up simulation parameters like "a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1]" but does not refer to a publicly accessible dataset.
Dataset Splits No The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset with explicit train/validation/test splits.
Hardware Specification No The paper mentions "numerical simulations" but does not provide any specific details about the hardware used to run these simulations (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the simulations or implementation.
Experiment Setup Yes The simulation setting is as follows: a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1], the feedback function F(x) = xα with α = 1.5 and payment b = 1.5 with an incentive impact function G(x, t) = x. We use the optimal ALn ETC parameter q = 15.