High-dimensional Linear Bandits with Knapsacks
Authors: Wanteng Ma, Dong Xia, Jiashuo Jiang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. Numerical ResultsWe first examine the feasibility of our primal algorithm in the sparse recovery problem. To check the performance of Algorithm 1, suppose now we only consider one arm µ , and we want to estimate it in an online process. To this end, we always choose yt = 1 and thus pt = 1. At each t, we measure the sparse estimation error µs t µ 2 2, and the support recovery rate |supp(µs t) Ω |/s0, which indicates the ratio of the support set we have detected. The result is presented in Figure 1. |
| Researcher Affiliation | Academia | Wanteng Ma 1 Dong Xia 1 Jiashuo Jiang 2 1Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong SAR 2Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong SAR. Correspondence to: Jiashuo Jiang <jsjiang@ust.hk>. |
| Pseudocode | Yes | Algorithm 1 Online Hard Thresholding with Averaged Gradient (Online HT) Algorithm 2 Primal-Dual High-dimensional Bw K Algorithm Algorithm 3 High Dimensional Bandit by Online HT |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | No | The paper describes synthetic data generation for its experiments, e.g., "Here we set d = 1000, s0 = 10, σ = 0.5, and Σ to be the power decaying covariance matrix: Σij = α|i j|, where α = 0.5." (Section 7.1). It does not refer to using any publicly available datasets. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes generating synthetic data and setting parameters for simulations without specifying data partitioning for reproduction. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, nor does it specify any GPU/CPU models, processor types, or memory details. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | To this end, we always choose yt = 1 and thus pt = 1. At each t, we measure the sparse estimation error µs t µ 2 2, and the support recovery rate |supp(µs t) Ω |/s0, which indicates the ratio of the support set we have detected. The result is presented in Figure 1. Here we set d = 1000, s0 = 10, σ = 0.5, and Σ to be the power decaying covariance matrix: Σij = α|i j|, where α = 0.5. Compared with the prevalent LASSO method used in online high dimensional bandit problem (Kim & Paik, 2019; Hao et al., 2020; Ren & Zhou, 2023), our method shares efficient computational cost while achieving better estimation error. See Figure 1 for the arm estimation and support set recovery of our method. For the bandit problem, we choose d = 100, s0 = 10, K = 5. The covariates are still generated following Section 7.1. We study the regret accumulation for a fixed T and regret growth with respect to different Ts, respectively. The result is presented in Figure 2. Here, we mainly compare our ϵ-greedy Online HT method with LASSO bandit algorithm (Explore-Then-Commit method) in, e.g., (Hao et al., 2020; Li et al., 2022; Jang et al., 2022). In our simulation, we try different lengths of exploration phases t1 as t1 = 0.3T 2 3 and t1 = 0.5T 2 3 for LASSO bandit algorithm. The greedy Online HT means we simply treat each ϵt = 0. It can be observed that our method outperforms the LASSO bandit algorithm in the regret growth, and the greedy Online HT shows far slower regret growth than other algorithms.We now focus on the linear Bw K problem with high-dimensional sparse arms. We show the performance of our algorithm, together with the classic UCB-based linear Bw K algorithm, i.e., the lin CBw K (Agrawal & Devanur, 2016), to demonstrate the feasibility of the Online HT method. Notice that, in the original paper of (Agrawal & Devanur, 2016), the lin CBw K algorithm is designed for Model-C bandit problem, but it can be easily generalized to our Model-P setting by computing the UCB of multiple arms at the same time. We set d = 200, s0 = 10, K = 5, with generated following Section 7.1. The constraints are randomly generated following uniform distribution with m = 5, and each row of W a is also sparse with the support set same as µ a. We present our methods regret and relative regret control in Figure 3. The relative regret is defined by Regret OPT . It can be observed that when T is small, lin CBw K fails to control the cumulative regret due to the high dimensionality of the problem. As T grows larger, the impact of high dimensionality is decreased and thus two methods behave comparably. The relative regret curves also show this phenomenon. Our Online HT methods share faster convergence rates for the relative regret in the data-poor regime. |