Rebounding Bandits for Modeling Satiation Effects

Authors: Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now evaluate the performance of EEP experimentally, separately investigating the sample efficiency of our proposed estimators (10) for learning the satiation and reward models (Figure 2) and the computational performance of the w-lookahead policies (5) (Figure 3a).
Researcher Affiliation Academia Liu Leqi Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 leqi@cs.cmu.edu; Fatma Kılınç-Karzan Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 fkilinc@andrew.cmu.edu; Zachary C. Lipton Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 zlipton@cmu.edu; Alan L. Montgomery Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 alanmontgomery@cmu.edu
Pseudocode Yes Algorithm 1: w-lookahead Explore-Estimate-Plan
Open Source Code No The paper does not contain any explicit statements about open-sourcing code for the described methodology, nor does it provide any links to a code repository.
Open Datasets No The experiments use a simulated setup with defined parameters rather than a publicly available dataset. 'For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.'
Dataset Splits No The paper describes experiments based on a simulated environment and does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not specify the hardware used for running the experiments.
Software Dependencies Yes In order to solve the resulting integer programs, we use Gurobi 9.1 [23] and set the number of threads for solving the problem to 10.
Experiment Setup Yes For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.; For each horizon T, we examine the w-step lookahead regret of w-lookahead EEP where w = 2, 5, 8, 10.