Decentralized Cooperative Stochastic Bandits

Authors: David Martínez-Rubio, Varun Kanade, Patrick Rebeschini

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments. We show that the algorithm proposed in this work, DDUCB, does not only enjoy a better theoretical regret guarantee but it also performs better in practice. In general we have observed that the accelerated method performs well with the recommended values, that is, no tuning, for the exploration parameter η and the parameter ε that measures the precision of the mixing after a stage. Remember these values are η = 2, ε = 1 22. On the other hand the constant C that results in the unaccelerated method is usually excessively large, so it is convenient to heuristically decrease it, which corresponds to using a different value of ε. We set ε so the value of C for the unaccelerated method is the same as the value of C for the accelerated one. We have used the recommended modification of DDUCB consisting of adding to the variables αi and ai the information of the pulls that are done times 1/N while waiting for the vectors βi and bi to be mixed. This modification adds extra information that is at hand at virtually no computational cost so it is always convenient to use it. We tuned γ, the exploration parameter of coop UCB [25], to get best results for that algorithm and plot the executions for the best γ s and also γ = 2 for comparison purposes. In the figures one can observe that after a few stages, DDUCB algorithms learn with high precision which the best arm is and the regret curve that is observed afterwards shows an almost horizontal behavior. After 10000 iterations, coop UCB not only accumulates a greater regret but the slope indicates that it still has not learned effectively which the best arm is. See Appendix G for a more detailed description about the experiments. Figure 1: Simulation of DDUCB and coop UCB for cycles (top) and square grids (bottom) for 100 nodes (left) , 200 nodes (top right) and 225 nodes (bottom right).
Researcher Affiliation Academia David Martínez-Rubio Department of Computer Science University of Oxford Oxford, United Kingdom david.martinez@cs.ox.ac.uk Varun Kanade Department of Computer Science University of Oxford Oxford, United Kingdom varunk@cs.ox.ac.uk Patrick Rebeschini Department of Statistics University of Oxford Oxford, United Kingdom patrick.rebeschini@stats.ox.ac.uk
Pseudocode Yes Algorithm 1 DDUCB at node i. 1: ζi (X1 i , . . . , XK i ) ; zi (1, . . . , 1) 2: C = ln(2N/ε)/ p 2 ln(1/|λ2|) 3: αi ζi/N ;ai zi/N ;βi ζi ;bi zi 4: γi 0 ; ci 0 ; δi 0 ; di 0 5: t K ; s K 6: while t T do 7: k arg maxk n αk i ak i + q 8: for r from 0 to C 1 do 9: u Play arm k , return reward 10: γk i γk i + u ; ck i ck i + 1 11: βi mix(βi, r, i) ; bi mix(bi, r, i) 12: t t + 1 13: if t > T then return end if 14: end for 15: s (t C)N 16: δi δi +βi ;di di +bi ;αi δi ;ai di 17: βi γi ; bi ci ; γi 0 ; ci 0 18: end while Algorithm 2 Accelerated communication and mixing step. mix(yr,i, r, i) 1: if r is 0 then 2: w0 1/2; w 1 0 3: y0,i y0,i/2; y 1,i (0, . . . , 0) 4: end if 5: Send yr,i to neighbors 6: Receive corresp.values yr,j, j N(i) 7: y r,i P j N(i) 2Pijyr,j/|λ2| 8: wr+1 2wr/|λ2| wr 1 9: yr+1,i = wr wr+1 y r,i wr 1 wr+1 yr 1,i 10: if r is 0 then 11: y0,i 2y0,i ; w0 2w0 12: end if 13: return yr+1,i
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No The paper describes simulations (Figure 1) but does not mention the use of a publicly available dataset or provide any access information (link, DOI, citation) for the data used in the simulations.
Dataset Splits No The paper describes simulations but does not specify any training, validation, or test dataset splits. It mentions running for "10000 iterations" but no data partitioning details.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x).
Experiment Setup Yes In general we have observed that the accelerated method performs well with the recommended values, that is, no tuning, for the exploration parameter η and the parameter ε that measures the precision of the mixing after a stage. Remember these values are η = 2, ε = 1 22. On the other hand the constant C that results in the unaccelerated method is usually excessively large, so it is convenient to heuristically decrease it, which corresponds to using a different value of ε. We set ε so the value of C for the unaccelerated method is the same as the value of C for the accelerated one. We have used the recommended modification of DDUCB consisting of adding to the variables αi and ai the information of the pulls that are done times 1/N while waiting for the vectors βi and bi to be mixed. This modification adds extra information that is at hand at virtually no computational cost so it is always convenient to use it. We tuned γ, the exploration parameter of coop UCB [25], to get best results for that algorithm and plot the executions for the best γ s and also γ = 2 for comparison purposes.