Linear Multi-Resource Allocation with Semi-Bandit Feedback

Authors: Tor Lattimore, Koby Crammer, Csaba Szepesvari

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present two experiments to demonstrate the behaviour of Algorithm 2. All code and data is available in the supplementary material.
Researcher Affiliation Academia Tor Lattimore Department of Computing Science University of Alberta, Canada tor.lattimore@gmail.com Koby Crammer Department of Electrical Engineering The Technion, Israel koby@ee.technion.ac.il Csaba Szepesv ari Department of Computing Science University of Alberta, Canada szepesva@ualberta.ca
Pseudocode Yes Algorithm 1 and Algorithm 2 are presented in the paper.
Open Source Code Yes All code and data is available in the supplementary material.
Open Datasets Yes All code and data is available in the supplementary material. For this experiment we used D = K = 2 and n = 106 and ν = 8/10 2/10 4/10 2 and M = 1 0 1/2 1/2 where the kth column is the parameter/allocation for the kth task. We fix n = 5 105 and D = K = 2. For α ∈ (0, 1) we define να = 1/2 α/2 1/2 α/2 and M = 1 0 1 0.
Dataset Splits No The paper describes a sequential online learning problem and does not explicitly state traditional training/validation/test dataset splits. It refers to a time horizon 'n' and average regret over runs.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., GPU/CPU models, memory details).
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For this experiment we used D = K = 2 and n = 106 and ν = 8/10 2/10 4/10 2 and M = 1 0 1/2 1/2 where the kth column is the parameter/allocation for the kth task. We ran two versions of the algorithm. The first, exactly as given in Algorithm 2 and the second identical except that the weights were fixed to γtk = 4 for all t and k (this value is chosen because it corresponds to the minimum inverse variance for a Bernoulli variable). The data was produced by taking the average regret over 8 runs. We fix n = 5 105 and D = K = 2. For α ∈ (0, 1) we define να = 1/2 α/2 1/2 α/2 and M = 1 0 1 0.