High-Dimensional Sparse Linear Bandits
Authors: Botao Hao, Tor Lattimore, Mengdi Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiment We compare ESTC (our algorithm) with Lin UCB [Abbasi-Yadkori et al., 2011] and doubly-robust (DR) lasso bandits [Kim and Paik, 2019]. For ESTC, we use the theoretically suggested length of exploration stage. For Lin UCB, we use the theoretically suggested confidence interval. For DR-lasso, we use the code made available by the authors on-line. Case 1: linear contextual bandits. We use the setting in Section 5 of Kim and Paik [2019] with N = 20 arms, dimension d = 100, sparsity s = 5. At round t, we generate the action set from N(0N, V ), where Vii = 1 and Vik = ρ2 for every i = k. Larger ρ corresponds to high correlation setting that is more favorable to DR-lasso. The noise is from N(0, 1) and θ 0 = s. Case 2: hard problem instance. Consider the hard problem instance in the proof of minimax lower bound (Theorem 3.3), including an informative action set and an uninformative action set. Since the size of action set constructed in the hard problem instance grows exponentially with d, we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set. Conclusion: The experiments confirm our theoretical findings. Although our theory focuses on the fixed action set setting, ESTC works well in the contextual setting. DR-lasso bandits heavily rely on context distribution assumption and almost fail for the hard instance. Lin UCB suffers in the data-poor regime since it ignores the sparsity information. Figure 1: The top two figures are for Case 1 and the bottom two figures are for Case 2. |
| Researcher Affiliation | Collaboration | Botao Hao Deepmind haobotao000@gmail.com Tor Lattimore Deepmind lattimore@google.com Mengdi Wang Department of Electrical Engineering Princeton University mengdiw@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Explore the sparsity then commit (ESTC) 1: Input: time horizon n, action set A, exploration length n1, regularization parameter λ1; 2: Solve the optimization problem in Eq. (4.1) and denote the solution as bµ. 3: for t = 1, , n1 do 4: Independently pull arm At according to bµ and receive a reward: Yt = At, θ + ηt. 5: end for 6: Calculate the Lasso estimator [Tibshirani, 1996]: bθn1 = argmin θ Rd Yt At, θ 2 + λ1 θ 1 . (4.2) 7: for t = n1 + 1 to n do 8: Take greedy actions At = argminx A bθn1, x . 9: end for |
| Open Source Code | No | The paper does not state that the authors are releasing their own code for the methodology described. It mentions using code made available by the authors of DR-lasso, which is a third-party tool. |
| Open Datasets | No | The paper describes how data was generated for the experiments (e.g., "We use the setting in Section 5 of Kim and Paik [2019]... At round t, we generate the action set from N(0N, V )", "we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set"). It does not refer to a publicly available dataset with concrete access information (link, DOI, formal citation). |
| Dataset Splits | No | The paper does not specify exact percentages, sample counts, or citations to predefined splits for training, validation, or test sets. It describes the data generation process for its experimental cases but not a typical dataset splitting methodology. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only describes the experimental setup in terms of data generation and algorithm parameters. |
| Software Dependencies | No | The paper mentions using Lasso [Tibshirani, 1996] and that CVXPY [Diamond and Boyd, 2016] can be used to approximate the optimization problem. It also refers to other algorithms like Lin UCB and DR-lasso. However, it does not provide specific version numbers for any software libraries, frameworks, or dependencies used in their implementation of ESTC. |
| Experiment Setup | Yes | 6 Experiment We compare ESTC (our algorithm) with Lin UCB [Abbasi-Yadkori et al., 2011] and doubly-robust (DR) lasso bandits [Kim and Paik, 2019]. For ESTC, we use the theoretically suggested length of exploration stage. For Lin UCB, we use the theoretically suggested confidence interval. For DR-lasso, we use the code made available by the authors on-line. Case 1: linear contextual bandits. We use the setting in Section 5 of Kim and Paik [2019] with N = 20 arms, dimension d = 100, sparsity s = 5. At round t, we generate the action set from N(0N, V ), where Vii = 1 and Vik = ρ2 for every i = k. Larger ρ corresponds to high correlation setting that is more favorable to DR-lasso. The noise is from N(0, 1) and θ 0 = s. Case 2: hard problem instance. Consider the hard problem instance in the proof of minimax lower bound (Theorem 3.3), including an informative action set and an uninformative action set. Since the size of action set constructed in the hard problem instance grows exponentially with d, we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set. |