Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates

Authors: Xue Wang, Mingcheng Wei, Tao Yao

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present two experiments to benchmark our proposed the MCPBandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.
Researcher Affiliation Academia 1Pennsylvania State University, University Park, PA, USA 2University at Buffalo, Buffalo, NY, USA.
Pseudocode Yes MCP-Bandit Algorithm Require: input parameters q, h, λ1, λ2,0 Initialize ˆβM(Ti,0, λ1) and ˆβM(Si,0, λ2,0) for i K for t = 1, 2.... do Observe xt If t Ti for i = 1, 2, ..., K Set πt to i Else Update ˆβM(Ti,t 1, λ1) for i K with 2s WL ˆK = {i|x T t ˆβM(Ti,t 1, λ1) maxj K{x T t ˆβM(Tj,t 1, λ1)} h/2} Update ˆβM(Si,t 1, λ2,t 1) for i ˆK with 2s WL πt = arg maxi ˆ K n x T t ˆβM(Si,t 1, λ2,t 1) o Set Sπt,t to Sπt,t 1 t and λ2,t to λ2,0 q log t+log d t Play arm πt and observes yt end for
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of its source code.
Open Datasets Yes The second experiment considers a health-care decisionmaking process in which physicians determine the optimal warfarin dosage for every incoming patient. The warfarin dosing patient data (Consortium et al. 2009), which is known to be dense (e.g., log T is not necessarily larger than s), contains approximately 100 detailed covariates for 5,700 patients.
Dataset Splits No The paper mentions generating covariates and errors for synthetic data and using patient data, but it does not provide specific training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list any software dependencies with specific version numbers.
Experiment Setup Yes In the synthetic data experiment, we present a two-arm bandit setting with decision parameter βi, i = 1, 2. To simulate different sparsity level, we generate four possible covariates dimensions, d = 10, 102, 103, and 104, and keep the dimension for significant covariates unchanged s = 5. ... We arbitrarily set the coefficients for significant covariates for the first arm to be β1 = (1, 2, 3, 4, 5) and for the second arm to be β2 = 1.1 β1 . The covariates are generated from N(0, Σ), where Σij = 0.5|i j| and the random error ϵ follows N(0, 1). For each covariates dimension, we generate an average of 10,000 trials. ... we share the same parameter λ in both the Lasso-Bandit algorithm and the MCP-Bandit algorithm and select the unique parameter for the MCP-Bandit algorithm a at 2.