Model Selection in Contextual Stochastic Bandit Problems
Authors: Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment (Figure 1). Let d = 2. Consider a contextual bandit problem with k = 50 arms, where each arm j has an associated vector aj 2 Rd sampled uniformly at random from [0, 1]d. We consider two cases: (1) For a 2 Rd sampled uniformly at random from [0, 1]d, reward of arm j at time t is a> j + t, where t N(0, 1), and (2) There are k parameters µj for j 2 [k] all sampled uniformly at random from [0, 10], so that the reward of arm j at time t is sampled from N(µj, 1). We use CORRAL with learning rate = 2 p T d and UCB and Lin UCB as base algorithm. In case (1) Lin UCB performs better while in case (2) UCB performs better. Each experiment is repeated 500 times. |
| Researcher Affiliation | Collaboration | Aldo Pacchiano UC Berkeley My Phan University of Massachusetts Yasin Abbasi-Yadkori Julian Zimmert Google Research Tor Lattimore Csaba Szepesvári DeepMind and University of Alberta |
| Pseudocode | Yes | Algorithm 1 Master Algorithm Input: Base Algorithms {Bj}M j=1 for t = 1, , T do Play base jt. Receive feedback rt = rt,jt from Bjt Update itself using rt end for |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes synthetic data generation setups for its experiments, such as actions sampled uniformly at random from a distribution or Bernoulli arms, but it does not provide concrete access information for any publicly available or open dataset. |
| Dataset Splits | No | The paper describes experimental setups with synthetic data but does not specify any train/validation/test dataset splits, as it does not rely on pre-existing benchmark datasets with defined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper discusses algorithms and mathematical formulations but does not mention specific software dependencies or their version numbers required for replication. |
| Experiment Setup | Yes | Experiment (Figure 1). Let d = 2. Consider a contextual bandit problem with k = 50 arms... We use CORRAL with learning rate = 2 p T d and UCB and Lin UCB as base algorithm... Experiment (Figure 2)... We take T = 50, 000, = 20/T and s to lie on a geometric grid in [1, 2T]. |