Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Rebounding Bandits for Modeling Satiation Effects
Authors: Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now evaluate the performance of EEP experimentally, separately investigating the sample efficiency of our proposed estimators (10) for learning the satiation and reward models (Figure 2) and the computational performance of the w-lookahead policies (5) (Figure 3a). |
| Researcher Affiliation | Academia | Liu Leqi Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 EMAIL; Fatma Kılınç-Karzan Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 EMAIL; Zachary C. Lipton Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 EMAIL; Alan L. Montgomery Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1: w-lookahead Explore-Estimate-Plan |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The experiments use a simulated setup with defined parameters rather than a publicly available dataset. 'For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.' |
| Dataset Splits | No | The paper describes experiments based on a simulated environment and does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments. |
| Software Dependencies | Yes | In order to solve the resulting integer programs, we use Gurobi 9.1 [23] and set the number of threads for solving the problem to 10. |
| Experiment Setup | Yes | For the experimental setup, we have 5 arms with satiation retention factors γ1 = γ2 = .5, γ3 = .6, γ4 = .7, γ5 = .8, exposure influence factors λ1 = 1, λ2 = λ3 = 3, λ4 = λ5 = 2, base rewards b1 = 2, b2 = 3, b3 = 4, b4 = 2, b5 = 10, and noise with variance σz = 0.1.; For each horizon T, we examine the w-step lookahead regret of w-lookahead EEP where w = 2, 5, 8, 10. |