Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation
Authors: Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we support our theoretical results by simulations of strategic arm behavior which confirm the effectiveness and robustness of our proposed incentive design. |
| Researcher Affiliation | Collaboration | Thomas Kleine Buening1, Aadirupa Saha2 , Christos Dimitrakakis3, Haifeng Xu4 1The Alan Turing Institute, 2TTIC, 3University of Neuchatel, 4University of Chicago Author is currently with Apple ML Research. |
| Pseudocode | Yes | Mechanism 1: UCB with Screening (UCB-S) 1 initialize: A0 = [K] 2 for t = 1, . . . , T do 3 if At 1 = then 4 Select it argmaxi At 1 µt 1 i 6 Select it uniformly at random from [K] 7 Arm it is clicked with probability sit, i.e., ct,it Bern(sit) 8 if it was clicked (ct,it = 1) then 9 Observe post-click reward rt,it 10 if st it < minµ [µt it,µt it] s (µ) or st it > maxµ [µt it,µt it] s (µ) then 11 Ignore arm it in future rounds: At At 1 \ {it} |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes 'simulations' of strategic arm behavior with a defined utility function and parameters (e.g., λ=5) but does not refer to using any specific public or open dataset for training or evaluation. The platforms mentioned (Amazon, Youtube, Airbnb, etc.) are examples of applications, not datasets used. |
| Dataset Splits | No | The paper describes 'simulations' and an 'Experimental Setup' but does not specify any dataset splits (e.g., training, validation, test percentages or counts) or refer to predefined splits from a public dataset. |
| Hardware Specification | No | The paper describes the experimental setup for simulations but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run these simulations. |
| Software Dependencies | No | The paper describes the experimental setup for simulations but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | Experimental Setup. We consider the earlier introduced utility function defined as u(s, µ) = sµ λ(s µ)2 such that the desired (learner s utility-maximizing) strategy given µ is s (µ) = (1 + 1 2λ)µ. We let λ = 5. To model the strategic behavior of arms in response to UCB-S, we let the strategic arms interact with the mechanism over the course of 20 epochs (x-axis) and model each arm s strategic behavior via gradient ascent w.r.t. its utility vi. More precisely, after every epoch (i.e., interaction over T = 50k rounds), each arm performs an approximated gradient step with respect to its utility vi. We initialized the arm strategies to si = 1, however, our experiments show that other initialization, such as si = 0 or si = 0.5, yield similar results. All results are averaged over 10 complete runs and the standard deviation shown in shaded color. |