Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Linear Bandits with Memory
Authors: Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we complement our theoretical results with experiments comparing our approach to natural baselines. ... Empirically, our algorithm outperforms natural baselines, such as the oracle greedy strategy (playing the action with the best instantaneous expected reward) and a naive block learning approach. Our experimental results also include misspecified settings, where we learn θ and simultaneously either m or γ. ... We perform experiments to validate the theoretical performance of OM and O3M (Algorithm 1). |
| Researcher Affiliation | Academia | Giulia Clerici EMAIL Department of Computer Science, University of Milan, Italy. Pierre Laforgue EMAIL Department of Computer Science, University of Milan, Italy. Nicolò Cesa-Bianchi EMAIL Department of Computer Science, University of Milan, Italy DEIB, Politecnico di Milano, Italy. |
| Pseudocode | Yes | Algorithm 1 OFUL-memory (OM, O3M) ... Algorithm 2 Bandit Combiner on O3M |
| Open Source Code | Yes | The code is written in Python and it is publicly available at the following Git Hub repository: Linear Bandits with Memory. |
| Open Datasets | No | Similarly to (Warlop et al., 2018), we work with synthetic data because of the counterfactual nature of the learning problem in bandits. |
| Dataset Splits | No | The paper uses synthetic data and does not describe any specific training/test/validation splits for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The code is written in Python and it is publicly available at the following Git Hub repository: Linear Bandits with Memory. However, it does not specify a Python version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Unless stated otherwise, we set d = 3 while θ Rd is generated uniformly at random with unit norm. The rewards are generated according to (1) and (2), and perturbed by Gaussian noise with standard deviation σ = 1/10. ... In Figure 2 (left pane) we compare the performance of O3M against oracle greedy, vanilla OFUL, and two instances of Bandit Combiner (Algorithm 2. The first instance, Combiner γ, works in the setting where the misspecified parameter is γ and the algorithm is run over the set { 4, 3, 2, 1, 0} of possible values for γ with the true value being 3. The second instance, Combiner m, tests the setting where the misspecified parameter is m. In this case the algorithm is run over the set {0, 2, 3} of possible values for m with the true value being 2. ... We start by analyzing the rotting scenario with m = 2 and γ = 3. |