Linear bandits with Stochastic Delayed Feedback
Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brückner
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model, assumptions and results are validated by experiments on simulated and real data. |
| Researcher Affiliation | Collaboration | 1Deep Mind, London, UK 2Otto-Von-Guericke Universit at, Magdeburg, Germany 3Amazon, Berlin, Germany. |
| Pseudocode | Yes | Algorithm 1 OTFLin UCB |
| Open Source Code | Yes | The code for all data analysis and simulations is available at https://sites.google.com/view/bandits-delayed-feedback |
| Open Datasets | Yes | the more recent dataset released by (Diemert et al., 2017) features heavy-tailed delays, despite being sourced from a similar online marketing problem in the same company. ... https://ailab.criteo.com/criteo-attribution-modeling-biddingdataset/ |
| Dataset Splits | No | The paper describes a sequential learning setting and uses a 'window parameter' for feedback, but does not specify explicit training, validation, and test dataset splits as commonly found in supervised learning setups. |
| Hardware Specification | No | The paper mentions running simulations but does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'Scipy library' for plotting, but it does not provide specific version numbers for Scipy or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We arbitrarily choose d = 5, K = 10. We fix the horizon to T = 3000, and we choose a geometric delay distribution with mean µ = E[Dt] {100, 500}. In a real setting, this would correspond to an experiment that lasts 3h, with average delays of 6 and 30 minutes. The online interaction with the environment is simulated: we fix θ = {1/d, . . . , 1/d} and at each round we sample and normalize K actions from {0, 1}d. All result are averaged over 50 independent runs. |