Online Clustering of Bandits

Authors: Claudio Gentile, Shuai Li, Giovanni Zappella

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.
Researcher Affiliation Collaboration Claudio Gentile CLAUDIO.GENTILE@UNINSUBRIA.IT Di STA, University of Insubria, Italy Shuai Li SHUAILI.SLI@GMAIL.COM Di STA, University of Insubria, Italy Giovanni Zappella ZAPPELLA@AMAZON.COM Amazon Development Center Germany, Germany (Work done when the author was Ph D student at Univeristy of Milan)
Pseudocode Yes Figure 1. Pseudocode of the CLUB algorithm.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Datasets and their full descriptions are available at www.grouplens.org/node/462. (...) We extracted two datasets from the one adopted by the ICML 2012 Exploration and Exploitation 3 Challenge 10 for news article recommendation. (...) 10 https://explochallenge.inria.fr/
Dataset Splits Yes On the synthetic datasets, as well as on the Last FM and Delicious datasets, we tuned these parameters by picking the best setting (as measured by cumulative regret) after the first t0 = 5,000 rounds, and then sticked to those values for the remaining T t0 = 50,000 rounds. (...) On the Yahoo dataset, this optimal tuning was done within the first t0 = 100,000 records, corresponding to a number of retained records between 4, 350 and 4, 450 across different algorithms.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We set ct = 10 for all t = 1, . . . , T, with time horizon T = 5, 000 + 50, 000, d = 25, and n = 500. (...) The payoff value associated with cluster vector uj and context vector xt,k has been generated by perturbing the inner product u j xt,k through an additive white noise term ǫ drawn uniformly at random across the interval [ σ, σ]. (...) CLUB was run with p = 3 log n / n , so as to be reasonably confident that the initial graph is at least connected. (...) All algorithms (but RAN) require parameter tuning: an exploration-exploitation tradeoff parameter which is common to all algorithms (in Figure 1, this is the α parameter), and the edge deletion parameter α2 in CLUB.