Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regulating Greed Over Time in Multi-Armed Bandits

Authors: Stefano Tracà, Cynthia Rudin, Weiyu Yan

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical comparisons, both in a simulated environment (Section 4 and Appendix G) and in a real-data environment (Section 6 and Appendix H).
Researcher Affiliation	Academia	Stefano Tracà EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA Cynthia Rudin EMAIL Department of Computer Science Duke University Durham, NC 27708, USA Weiyu Yan EMAIL Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA
Pseudocode	Yes	Algorithm 1: ε-z greedy algorithm Algorithm 2: Soft ε-greedy algorithm Algorithm 3: UCB-z algorithm Algorithm 4: Soft UCB algorithm Algorithm 5: variable arm pool algorithm Algorithm 6: Smarter version of the standard ε-greedy algorithm Algorithm 7: Smarter version of the standard UCB algorithm Algorithm 8: Soft UCB mortal algorithm Algorithm 9: Smarter version of the UCB-L algorithm
Open Source Code	Yes	A Python implementation of the algorithms is available online.1. 1. The implementation of the algorithms used in the simulated environment is available at https://github.com/5tefan0/Regulating-Greed-Over-Time The implementation of the algorithms used in the real data setting is available at https://github.com/Shrek Felix/Regulating-Greed-Over-Time.
Open Datasets	Yes	The ideas in this paper were inspired by a high scoring entry in the Exploration and Exploitation 3 Phase 1 data mining competition, where the goal was to build a better recommendation system for Yahoo! Front Page news articles. using the event log data from the Yahoo! Webscope program.
Dataset Splits	No	For the ε-greedy algorithms and the variable pool algorithm, different articles are chosen during each run of the algorithms, because during exploration phases they are chosen at random. For those, we use one slice of the dataset (the time period between the two blue lines in Figure 12), which is the longest slice with the largest number of available arms. Also, since the ofﬂine evaluation consumes records very quickly (all records for which the algorithm did not choose the article recommended by the website must be discarded because no label is available), we duplicated this slice to increase the size of the dataset. The UCB algorithms are deterministic in their choices because they always pick the arm with the best upper conﬁdence bound, i.e., running the UCB family many times on the same dataset will always give the same result. Therefore, we used a sliding window to cover different portions of the dataset to evaluate the performance of the UCB algorithms for each game. The randomness of their ﬁnal rewards arises from the different portions of dataset used.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, only mentioning 'simulated environment' and 'real-data environment'.
Software Dependencies	No	A Python implementation of the algorithms is available online. No specific version of Python or any other libraries are mentioned.
Experiment Setup	Yes	Table 13: Parameters of the algorithms used in the simulation. algorithm hyperparameters ε-z Greedy z = 31 Variable Pool c = 10, z = 31 UCB-L c = 0.011 UCB-z z = 31 We ran the algorithms 100 times to derive an empirical distribution of the rewards. For the ε-greedy algorithms and the variable pool algorithm, different articles are chosen during each run of the algorithms, because during exploration phases they are chosen at random.