reproducibilityindex.ai

High-Dimensional Sparse Linear Bandits

Authors: Botao Hao, Tor Lattimore, Mengdi Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiment We compare ESTC (our algorithm) with Lin UCB [Abbasi-Yadkori et al., 2011] and doubly-robust (DR) lasso bandits [Kim and Paik, 2019]. For ESTC, we use the theoretically suggested length of exploration stage. For Lin UCB, we use the theoretically suggested conﬁdence interval. For DR-lasso, we use the code made available by the authors on-line. Case 1: linear contextual bandits. We use the setting in Section 5 of Kim and Paik [2019] with N = 20 arms, dimension d = 100, sparsity s = 5. At round t, we generate the action set from N(0N, V ), where Vii = 1 and Vik = ρ2 for every i = k. Larger ρ corresponds to high correlation setting that is more favorable to DR-lasso. The noise is from N(0, 1) and θ 0 = s. Case 2: hard problem instance. Consider the hard problem instance in the proof of minimax lower bound (Theorem 3.3), including an informative action set and an uninformative action set. Since the size of action set constructed in the hard problem instance grows exponentially with d, we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set. Conclusion: The experiments conﬁrm our theoretical ﬁndings. Although our theory focuses on the ﬁxed action set setting, ESTC works well in the contextual setting. DR-lasso bandits heavily rely on context distribution assumption and almost fail for the hard instance. Lin UCB suffers in the data-poor regime since it ignores the sparsity information. Figure 1: The top two ﬁgures are for Case 1 and the bottom two ﬁgures are for Case 2.
Researcher Affiliation	Collaboration	Botao Hao Deepmind haobotao000@gmail.com Tor Lattimore Deepmind lattimore@google.com Mengdi Wang Department of Electrical Engineering Princeton University mengdiw@princeton.edu
Pseudocode	Yes	Algorithm 1 Explore the sparsity then commit (ESTC) 1: Input: time horizon n, action set A, exploration length n1, regularization parameter λ1; 2: Solve the optimization problem in Eq. (4.1) and denote the solution as bµ. 3: for t = 1, , n1 do 4: Independently pull arm At according to bµ and receive a reward: Yt = At, θ + ηt. 5: end for 6: Calculate the Lasso estimator [Tibshirani, 1996]: bθn1 = argmin θ Rd Yt At, θ 2 + λ1 θ 1 . (4.2) 7: for t = n1 + 1 to n do 8: Take greedy actions At = argminx A bθn1, x . 9: end for
Open Source Code	No	The paper does not state that the authors are releasing their own code for the methodology described. It mentions using code made available by the authors of DR-lasso, which is a third-party tool.
Open Datasets	No	The paper describes how data was generated for the experiments (e.g., "We use the setting in Section 5 of Kim and Paik [2019]... At round t, we generate the action set from N(0N, V )", "we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set"). It does not refer to a publicly available dataset with concrete access information (link, DOI, formal citation).
Dataset Splits	No	The paper does not specify exact percentages, sample counts, or citations to predefined splits for training, validation, or test sets. It describes the data generation process for its experimental cases but not a typical dataset splitting methodology.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only describes the experimental setup in terms of data generation and algorithm parameters.
Software Dependencies	No	The paper mentions using Lasso [Tibshirani, 1996] and that CVXPY [Diamond and Boyd, 2016] can be used to approximate the optimization problem. It also refers to other algorithms like Lin UCB and DR-lasso. However, it does not provide specific version numbers for any software libraries, frameworks, or dependencies used in their implementation of ESTC.
Experiment Setup	Yes	6 Experiment We compare ESTC (our algorithm) with Lin UCB [Abbasi-Yadkori et al., 2011] and doubly-robust (DR) lasso bandits [Kim and Paik, 2019]. For ESTC, we use the theoretically suggested length of exploration stage. For Lin UCB, we use the theoretically suggested conﬁdence interval. For DR-lasso, we use the code made available by the authors on-line. Case 1: linear contextual bandits. We use the setting in Section 5 of Kim and Paik [2019] with N = 20 arms, dimension d = 100, sparsity s = 5. At round t, we generate the action set from N(0N, V ), where Vii = 1 and Vik = ρ2 for every i = k. Larger ρ corresponds to high correlation setting that is more favorable to DR-lasso. The noise is from N(0, 1) and θ 0 = s. Case 2: hard problem instance. Consider the hard problem instance in the proof of minimax lower bound (Theorem 3.3), including an informative action set and an uninformative action set. Since the size of action set constructed in the hard problem instance grows exponentially with d, we uniformly randomly sample 500 actions from the full informative action set and 200 from uninformative action set.