reproducibilityindex.ai

Rebounding Bandits for Modeling Satiation Effects

Authors: Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now evaluate the performance of EEP experimentally, separately investigating the sample efficiency of our proposed estimators (10) for learning the satiation and reward models (Figure 2) and the computational performance of the w-lookahead policies (5) (Figure 3a).
Researcher Affiliation	Academia	Liu Leqi Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 leqi@cs.cmu.edu; Fatma Kılınç-Karzan Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 fkilinc@andrew.cmu.edu; Zachary C. Lipton Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 zlipton@cmu.edu; Alan L. Montgomery Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 alanmontgomery@cmu.edu
Pseudocode	Yes	Algorithm 1: w-lookahead Explore-Estimate-Plan
Open Source Code	No	The paper does not contain any explicit statements about open-sourcing code for the described methodology, nor does it provide any links to a code repository.
Open Datasets	No	The experiments use a simulated setup with defined parameters rather than a publicly available dataset. 'For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure inﬂuence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.'
Dataset Splits	No	The paper describes experiments based on a simulated environment and does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments.
Software Dependencies	Yes	In order to solve the resulting integer programs, we use Gurobi 9.1 [23] and set the number of threads for solving the problem to 10.
Experiment Setup	Yes	For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure inﬂuence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.; For each horizon T, we examine the w-step lookahead regret of w-lookahead EEP where w = 2, 5, 8, 10.