Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Linear Bandits with Memory

Authors: Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we complement our theoretical results with experiments comparing our approach to natural baselines. ... Empirically, our algorithm outperforms natural baselines, such as the oracle greedy strategy (playing the action with the best instantaneous expected reward) and a naive block learning approach. Our experimental results also include misspecified settings, where we learn θ and simultaneously either m or γ. ... We perform experiments to validate the theoretical performance of OM and O3M (Algorithm 1).
Researcher Affiliation	Academia	Giulia Clerici EMAIL Department of Computer Science, University of Milan, Italy. Pierre Laforgue EMAIL Department of Computer Science, University of Milan, Italy. Nicolò Cesa-Bianchi EMAIL Department of Computer Science, University of Milan, Italy DEIB, Politecnico di Milano, Italy.
Pseudocode	Yes	Algorithm 1 OFUL-memory (OM, O3M) ... Algorithm 2 Bandit Combiner on O3M
Open Source Code	Yes	The code is written in Python and it is publicly available at the following Git Hub repository: Linear Bandits with Memory.
Open Datasets	No	Similarly to (Warlop et al., 2018), we work with synthetic data because of the counterfactual nature of the learning problem in bandits.
Dataset Splits	No	The paper uses synthetic data and does not describe any specific training/test/validation splits for reproduction.
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The code is written in Python and it is publicly available at the following Git Hub repository: Linear Bandits with Memory. However, it does not specify a Python version or any other software dependencies with version numbers.
Experiment Setup	Yes	Unless stated otherwise, we set d = 3 while θ Rd is generated uniformly at random with unit norm. The rewards are generated according to (1) and (2), and perturbed by Gaussian noise with standard deviation σ = 1/10. ... In Figure 2 (left pane) we compare the performance of O3M against oracle greedy, vanilla OFUL, and two instances of Bandit Combiner (Algorithm 2. The first instance, Combiner γ, works in the setting where the misspecified parameter is γ and the algorithm is run over the set { 4, 3, 2, 1, 0} of possible values for γ with the true value being 3. The second instance, Combiner m, tests the setting where the misspecified parameter is m. In this case the algorithm is run over the set {0, 2, 3} of possible values for m with the true value being 2. ... We start by analyzing the rotting scenario with m = 2 and γ = 3.