Linear bandits with Stochastic Delayed Feedback

Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brückner

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model, assumptions and results are validated by experiments on simulated and real data.
Researcher Affiliation Collaboration 1Deep Mind, London, UK 2Otto-Von-Guericke Universit at, Magdeburg, Germany 3Amazon, Berlin, Germany.
Pseudocode Yes Algorithm 1 OTFLin UCB
Open Source Code Yes The code for all data analysis and simulations is available at https://sites.google.com/view/bandits-delayed-feedback
Open Datasets Yes the more recent dataset released by (Diemert et al., 2017) features heavy-tailed delays, despite being sourced from a similar online marketing problem in the same company. ... https://ailab.criteo.com/criteo-attribution-modeling-biddingdataset/
Dataset Splits No The paper describes a sequential learning setting and uses a 'window parameter' for feedback, but does not specify explicit training, validation, and test dataset splits as commonly found in supervised learning setups.
Hardware Specification No The paper mentions running simulations but does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments.
Software Dependencies No The paper mentions using the 'Scipy library' for plotting, but it does not provide specific version numbers for Scipy or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes We arbitrarily choose d = 5, K = 10. We fix the horizon to T = 3000, and we choose a geometric delay distribution with mean µ = E[Dt] {100, 500}. In a real setting, this would correspond to an experiment that lasts 3h, with average delays of 6 and 30 minutes. The online interaction with the environment is simulated: we fix θ = {1/d, . . . , 1/d} and at each round we sample and normalize K actions from {0, 1}d. All result are averaged over 50 independent runs.