Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates
Authors: Xue Wang, Mingcheng Wei, Tao Yao
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present two experiments to benchmark our proposed the MCPBandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small. |
| Researcher Affiliation | Academia | 1Pennsylvania State University, University Park, PA, USA 2University at Buffalo, Buffalo, NY, USA. |
| Pseudocode | Yes | MCP-Bandit Algorithm Require: input parameters q, h, λ1, λ2,0 Initialize ˆβM(Ti,0, λ1) and ˆβM(Si,0, λ2,0) for i K for t = 1, 2.... do Observe xt If t Ti for i = 1, 2, ..., K Set πt to i Else Update ˆβM(Ti,t 1, λ1) for i K with 2s WL ˆK = {i|x T t ˆβM(Ti,t 1, λ1) maxj K{x T t ˆβM(Tj,t 1, λ1)} h/2} Update ˆβM(Si,t 1, λ2,t 1) for i ˆK with 2s WL πt = arg maxi ˆ K n x T t ˆβM(Si,t 1, λ2,t 1) o Set Sπt,t to Sπt,t 1 t and λ2,t to λ2,0 q log t+log d t Play arm πt and observes yt end for |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of its source code. |
| Open Datasets | Yes | The second experiment considers a health-care decisionmaking process in which physicians determine the optimal warfarin dosage for every incoming patient. The warfarin dosing patient data (Consortium et al. 2009), which is known to be dense (e.g., log T is not necessarily larger than s), contains approximately 100 detailed covariates for 5,700 patients. |
| Dataset Splits | No | The paper mentions generating covariates and errors for synthetic data and using patient data, but it does not provide specific training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list any software dependencies with specific version numbers. |
| Experiment Setup | Yes | In the synthetic data experiment, we present a two-arm bandit setting with decision parameter βi, i = 1, 2. To simulate different sparsity level, we generate four possible covariates dimensions, d = 10, 102, 103, and 104, and keep the dimension for significant covariates unchanged s = 5. ... We arbitrarily set the coefficients for significant covariates for the first arm to be β1 = (1, 2, 3, 4, 5) and for the second arm to be β2 = 1.1 β1 . The covariates are generated from N(0, Σ), where Σij = 0.5|i j| and the random error ϵ follows N(0, 1). For each covariates dimension, we generate an average of 10,000 trials. ... we share the same parameter λ in both the Lasso-Bandit algorithm and the MCP-Bandit algorithm and select the unique parameter for the MCP-Bandit algorithm a at 2. |