reproducibilityindex.ai

Incentivized Bandit Learning with Self-Reinforcing User Preferences

Authors: Tianchen Zhou, Jia Liu, Chaosheng Dong, Jingyuan Deng

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct numerical simulations to demonstrate and verify the performances of these two policies and study their robustness under various settings.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, The Ohio State University, Columbus, Ohio, USA 2Amazon, Seattle, Washington, USA.
Pseudocode	Yes	Policy 1: At-Least-n Explore-Then-Commit ... Policy 2: The UCB-List Policy
Open Source Code	No	The paper does not provide any statements or links indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset. It sets up simulation parameters like "a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1]" but does not refer to a publicly accessible dataset.
Dataset Splits	No	The paper conducts numerical simulations with defined parameters rather than using a publicly available dataset with explicit train/validation/test splits.
Hardware Specification	No	The paper mentions "numerical simulations" but does not provide any specific details about the hardware used to run these simulations (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks) used for the simulations or implementation.
Experiment Setup	Yes	The simulation setting is as follows: a two-armed model with means µ = [0.3, 0.5] and initial biases θ = [100, 1], the feedback function F(x) = xα with α = 1.5 and payment b = 1.5 with an incentive impact function G(x, t) = x. We use the optimal ALn ETC parameter q = 15.