Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Authors: Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we support our theoretical results by simulations of strategic arm behavior which confirm the effectiveness and robustness of our proposed incentive design.
Researcher Affiliation Collaboration Thomas Kleine Buening1, Aadirupa Saha2 , Christos Dimitrakakis3, Haifeng Xu4 1The Alan Turing Institute, 2TTIC, 3University of Neuchatel, 4University of Chicago Author is currently with Apple ML Research.
Pseudocode Yes Mechanism 1: UCB with Screening (UCB-S) 1 initialize: A0 = [K] 2 for t = 1, . . . , T do 3 if At 1 = then 4 Select it argmaxi At 1 µt 1 i 6 Select it uniformly at random from [K] 7 Arm it is clicked with probability sit, i.e., ct,it Bern(sit) 8 if it was clicked (ct,it = 1) then 9 Observe post-click reward rt,it 10 if st it < minµ [µt it,µt it] s (µ) or st it > maxµ [µt it,µt it] s (µ) then 11 Ignore arm it in future rounds: At At 1 \ {it}
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper describes 'simulations' of strategic arm behavior with a defined utility function and parameters (e.g., λ=5) but does not refer to using any specific public or open dataset for training or evaluation. The platforms mentioned (Amazon, Youtube, Airbnb, etc.) are examples of applications, not datasets used.
Dataset Splits No The paper describes 'simulations' and an 'Experimental Setup' but does not specify any dataset splits (e.g., training, validation, test percentages or counts) or refer to predefined splits from a public dataset.
Hardware Specification No The paper describes the experimental setup for simulations but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run these simulations.
Software Dependencies No The paper describes the experimental setup for simulations but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes Experimental Setup. We consider the earlier introduced utility function defined as u(s, µ) = sµ λ(s µ)2 such that the desired (learner s utility-maximizing) strategy given µ is s (µ) = (1 + 1 2λ)µ. We let λ = 5. To model the strategic behavior of arms in response to UCB-S, we let the strategic arms interact with the mechanism over the course of 20 epochs (x-axis) and model each arm s strategic behavior via gradient ascent w.r.t. its utility vi. More precisely, after every epoch (i.e., interaction over T = 50k rounds), each arm performs an approximated gradient step with respect to its utility vi. We initialized the arm strategies to si = 1, however, our experiments show that other initialization, such as si = 0 or si = 0.5, yield similar results. All results are averaged over 10 complete runs and the standard deviation shown in shaded color.