Rebounding Bandits for Modeling Satiation Effects
Authors: Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now evaluate the performance of EEP experimentally, separately investigating the sample efficiency of our proposed estimators (10) for learning the satiation and reward models (Figure 2) and the computational performance of the w-lookahead policies (5) (Figure 3a). |
| Researcher Affiliation | Academia | Liu Leqi Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 leqi@cs.cmu.edu; Fatma Kılınç-Karzan Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 fkilinc@andrew.cmu.edu; Zachary C. Lipton Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 zlipton@cmu.edu; Alan L. Montgomery Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 alanmontgomery@cmu.edu |
| Pseudocode | Yes | Algorithm 1: w-lookahead Explore-Estimate-Plan |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The experiments use a simulated setup with defined parameters rather than a publicly available dataset. 'For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.' |
| Dataset Splits | No | The paper describes experiments based on a simulated environment and does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments. |
| Software Dependencies | Yes | In order to solve the resulting integer programs, we use Gurobi 9.1 [23] and set the number of threads for solving the problem to 10. |
| Experiment Setup | Yes | For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.; For each horizon T, we examine the w-step lookahead regret of w-lookahead EEP where w = 2, 5, 8, 10. |