From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

Authors: Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally provide numerical experiments showing the merits of DS in a decision-making problem on synthetic agriculture data. and Finally, we study in Section 4 a use-case in agriculture using the DSSAT simulator (see Hoogenboom et al. (2019)), which naturally faces all the questions (robustness, model specification) that motivate this work and shows the merit of DS over state-of-the-art methods for this problem.
Researcher Affiliation Academia Dorian Baudry Patrick Saux Odalric-Ambrym Maillard dorian.baudry@inria.fr patrick.saux@inria.fr odalric.maillard@inria.fr Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9198-CRISt AL, F-59000 Lille, France
Pseudocode Yes Algorithm 1 Generic Dirichlet Sampling Input: K arms, horizon T, Dirichlet Sampling index eµ Init.: t = 1, r = 1, k {1, ..., K}: Xk = {Xk 1 }, Nk = 1; Draw each arm once while t < T do A = {} ; Arm(s) to pull at the end of the round ℓ= Leader((X1, N1), . . . , (Xk, Nk)) ; Choose a Leader for k {1, . . . , K} : Nk < Nℓdo if max(µ(Xk), eµ(Xk, Xℓ)) µ(Xℓ) then A = A {k} ; Play the duels Draw arms from |A| if A is non-empty, else draw arm ℓ. Update t, r, (Nk)k {1,...,K}, (Xk)k {1,...,K}. ; Collect Reward(s) and update data
Open Source Code Yes The code to reproduce the experiments is available in this github repository. (Footnote links to https://github.com/dbaudry/dirichlet-sampling)
Open Datasets No We consider a practical decision-making problem using the DSSAT2 simulator (Hoogenboom et al., 2019). Harnessing more than 30 years of expert knowledge, this simulator is calibrated on historical field data (soil measurements, genetics, planting date...) and generates realistic crop yields. (The paper uses a simulator to generate data, but the generated data itself is not explicitly made public with a specific link or citation).
Dataset Splits No No explicit train/validation/test dataset splits were provided, as the experiments are based on simulations rather than fixed datasets.
Hardware Specification No Our code does not require powerful computing ressources, experiments are replicable with a laptop. and Experiments presented in this paper were carried out using the Grid 5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). (No specific hardware models like CPU/GPU are provided).
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were explicitly provided.
Experiment Setup Yes Tuning For BDS we choose the parameters ρ = 4, γ = 3500, corresponding to p 20% in the hypothesis of Theorem 3.4, which is conservative in our example. For QDS, we set ρ = 4 to be able to compare with BDS and a quantile 95%. Finally for RDS, we choose ρn = p log (1+n), which enters into the theoretical framework of Theorem 3.7.