Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Regulating Greed Over Time in Multi-Armed Bandits
Authors: Stefano Tracà, Cynthia Rudin, Weiyu Yan
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical comparisons, both in a simulated environment (Section 4 and Appendix G) and in a real-data environment (Section 6 and Appendix H). |
| Researcher Affiliation | Academia | Stefano Tracà EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA Cynthia Rudin EMAIL Department of Computer Science Duke University Durham, NC 27708, USA Weiyu Yan EMAIL Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA |
| Pseudocode | Yes | Algorithm 1: ε-z greedy algorithm Algorithm 2: Soft ε-greedy algorithm Algorithm 3: UCB-z algorithm Algorithm 4: Soft UCB algorithm Algorithm 5: variable arm pool algorithm Algorithm 6: Smarter version of the standard ε-greedy algorithm Algorithm 7: Smarter version of the standard UCB algorithm Algorithm 8: Soft UCB mortal algorithm Algorithm 9: Smarter version of the UCB-L algorithm |
| Open Source Code | Yes | A Python implementation of the algorithms is available online.1. 1. The implementation of the algorithms used in the simulated environment is available at https://github.com/5tefan0/Regulating-Greed-Over-Time The implementation of the algorithms used in the real data setting is available at https://github.com/Shrek Felix/Regulating-Greed-Over-Time. |
| Open Datasets | Yes | The ideas in this paper were inspired by a high scoring entry in the Exploration and Exploitation 3 Phase 1 data mining competition, where the goal was to build a better recommendation system for Yahoo! Front Page news articles. using the event log data from the Yahoo! Webscope program. |
| Dataset Splits | No | For the ε-greedy algorithms and the variable pool algorithm, different articles are chosen during each run of the algorithms, because during exploration phases they are chosen at random. For those, we use one slice of the dataset (the time period between the two blue lines in Figure 12), which is the longest slice with the largest number of available arms. Also, since the offline evaluation consumes records very quickly (all records for which the algorithm did not choose the article recommended by the website must be discarded because no label is available), we duplicated this slice to increase the size of the dataset. The UCB algorithms are deterministic in their choices because they always pick the arm with the best upper confidence bound, i.e., running the UCB family many times on the same dataset will always give the same result. Therefore, we used a sliding window to cover different portions of the dataset to evaluate the performance of the UCB algorithms for each game. The randomness of their final rewards arises from the different portions of dataset used. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, only mentioning 'simulated environment' and 'real-data environment'. |
| Software Dependencies | No | A Python implementation of the algorithms is available online. No specific version of Python or any other libraries are mentioned. |
| Experiment Setup | Yes | Table 13: Parameters of the algorithms used in the simulation. algorithm hyperparameters ε-z Greedy z = 31 Variable Pool c = 10, z = 31 UCB-L c = 0.011 UCB-z z = 31 We ran the algorithms 100 times to derive an empirical distribution of the rewards. For the ε-greedy algorithms and the variable pool algorithm, different articles are chosen during each run of the algorithms, because during exploration phases they are chosen at random. |