Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Linear bandits with Stochastic Delayed Feedback
Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brückner
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model, assumptions and results are validated by experiments on simulated and real data. |
| Researcher Affiliation | Collaboration | 1Deep Mind, London, UK 2Otto-Von-Guericke Universit at, Magdeburg, Germany 3Amazon, Berlin, Germany. |
| Pseudocode | Yes | Algorithm 1 OTFLin UCB |
| Open Source Code | Yes | The code for all data analysis and simulations is available at https://sites.google.com/view/bandits-delayed-feedback |
| Open Datasets | Yes | the more recent dataset released by (Diemert et al., 2017) features heavy-tailed delays, despite being sourced from a similar online marketing problem in the same company. ... https://ailab.criteo.com/criteo-attribution-modeling-biddingdataset/ |
| Dataset Splits | No | The paper describes a sequential learning setting and uses a 'window parameter' for feedback, but does not specify explicit training, validation, and test dataset splits as commonly found in supervised learning setups. |
| Hardware Specification | No | The paper mentions running simulations but does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'Scipy library' for plotting, but it does not provide specific version numbers for Scipy or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We arbitrarily choose d = 5, K = 10. We fix the horizon to T = 3000, and we choose a geometric delay distribution with mean µ = E[Dt] {100, 500}. In a real setting, this would correspond to an experiment that lasts 3h, with average delays of 6 and 30 minutes. The online interaction with the environment is simulated: we fix θ = {1/d, . . . , 1/d} and at each round we sample and normalize K actions from {0, 1}d. All result are averaged over 50 independent runs. |