Doubly-Robust Lasso Bandit
Authors: Gi-Soo Kim, Myunghee Cho Paik
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct simulation studies to evaluate the proposed DR Lasso Bandit and the Lasso bandit (Bastani and Bayati, 2015). We set N = 10, 20, 50, or 100, d = 100, and s0 = 5. For fixed j = 1, , d, we generate [b1j(t), , b Nj(t)]T from N(0N, V ) where V (i, i) = 1 for every i and V (i, k) = ρ2 for every i = k. We experiment two cases for ρ2, either ρ2 = 0.3 (weak correlation) or ρ2 = 0.7 (strong correlation). We generate ηi(t) i.i.d. N(0, 0.052) and the rewards from (1). We set ||β||0 = s0 and generate the s0 non-zero elements from a uniform distribution on [0, 1]. We conduct 10 replications for each case. The Lasso Bandit algorithm can be applied in our setting by considering a Nd-dimensional context vector b(t) = [b1(t)T , , b N(t)T ]T and a Nd-dimensional regression parameter βi for each arm i where βi = [I(i = 1)βT , , I(i = N)βT ]T . For each algorithm, we consider some candidates for the tuning parameters and report the best results. For DR Lasso Bandit, we advise to truncate the value ˆr(t) so that it does not explode. Figure 1 shows the cumulative regret R(t) according to time t. |
| Researcher Affiliation | Academia | Gi-Soo Kim Department of Statistics Seoul National University gisoo1989@snu.ac.kr Myunghee Cho Paik Department of Statistics Seoul National University myungheechopaik@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 DR Lasso Bandit |
| Open Source Code | No | The paper does not provide concrete access to source code for the described methodology. No links to repositories or statements of code release are found. |
| Open Datasets | Yes | Yahoo! Webscope. Yahoo! Front Page Today Module User Click Log Dataset, version 1.0. http: //webscope.sandbox.yahoo.com. Accessed: 09/01/2019. |
| Dataset Splits | No | The paper describes simulation studies and uses a dataset, but it does not specify explicit train/validation/test splits or cross-validation setup for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | Algorithm 1 DR Lasso Bandit Input parameters: λ1, λ2, z T [...] For each algorithm, we consider some candidates for the tuning parameters and report the best results. For DR Lasso Bandit, we advise to truncate the value ˆr(t) so that it does not explode. [...] We set N = 10, 20, 50, or 100, d = 100, and s0 = 5. For fixed j = 1, , d, we generate [b1j(t), , b Nj(t)]T from N(0N, V ) where V (i, i) = 1 for every i and V (i, k) = ρ2 for every i = k. We experiment two cases for ρ2, either ρ2 = 0.3 (weak correlation) or ρ2 = 0.7 (strong correlation). We generate ηi(t) i.i.d. N(0, 0.052) and the rewards from (1). We set ||β||0 = s0 and generate the s0 non-zero elements from a uniform distribution on [0, 1]. |