reproducibilityindex.ai

Linear bandits with Stochastic Delayed Feedback

Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brückner

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model, assumptions and results are validated by experiments on simulated and real data.
Researcher Affiliation	Collaboration	1Deep Mind, London, UK 2Otto-Von-Guericke Universit at, Magdeburg, Germany 3Amazon, Berlin, Germany.
Pseudocode	Yes	Algorithm 1 OTFLin UCB
Open Source Code	Yes	The code for all data analysis and simulations is available at https://sites.google.com/view/bandits-delayed-feedback
Open Datasets	Yes	the more recent dataset released by (Diemert et al., 2017) features heavy-tailed delays, despite being sourced from a similar online marketing problem in the same company. ... https://ailab.criteo.com/criteo-attribution-modeling-biddingdataset/
Dataset Splits	No	The paper describes a sequential learning setting and uses a 'window parameter' for feedback, but does not specify explicit training, validation, and test dataset splits as commonly found in supervised learning setups.
Hardware Specification	No	The paper mentions running simulations but does not provide any specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments.
Software Dependencies	No	The paper mentions using the 'Scipy library' for plotting, but it does not provide specific version numbers for Scipy or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	We arbitrarily choose d = 5, K = 10. We ﬁx the horizon to T = 3000, and we choose a geometric delay distribution with mean µ = E[Dt] {100, 500}. In a real setting, this would correspond to an experiment that lasts 3h, with average delays of 6 and 30 minutes. The online interaction with the environment is simulated: we ﬁx θ = {1/d, . . . , 1/d} and at each round we sample and normalize K actions from {0, 1}d. All result are averaged over 50 independent runs.