High-dimensional Contextual Bandit Problem without Sparsity
Authors: Junpei Komiyama, Masaaki Imaizumi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an explore-then-commit (Et C) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of πand show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEt C) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations. |
| Researcher Affiliation | Academia | Junpei Komiyama New York University junpei@komiyama.info Masaaki Imaizumi The University of Tokyo / RIKEN Center for AIP imaizumi@g.ecc.u-tokyo.ac.jp |
| Pseudocode | Yes | Algorithm 1 Explore-then-Commit (Et C) and Algorithm 2 Adaptive Explore-then-Commit (AEt C) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | No | The paper describes Data Generating Processes (DGPs) like 'DGP 1: Ξ£ = diag(π1, ..., ππ), ππ= π 0.5' and states 'We generate π(π) (π‘) from a centered π-dimensional Gaussian'. This indicates synthetic data generation rather than the use of a publicly available dataset with concrete access information (link, DOI, or formal citation to an established benchmark). |
| Dataset Splits | No | The paper describes an explore-then-commit strategy, which involves an 'exploration phase' where data is collected for estimation, followed by an 'exploitation phase'. However, it does not specify explicit train/validation/test dataset splits with percentages or sample counts for model training and evaluation in a typical supervised learning context. The data is generated on-the-fly based on DGPs. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its simulations. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We consider two setups: (πΎ, π,π) = (2, 200, 500) and (πΎ, π,π) = (10, 200, 1000). For each setup, we consider a covariance matrix Ξ£(π) = π(π) Ξ£ where π(π) Uni([0.5, 1.5]) and a base covariance Ξ£ for each π [πΎ], which represents a heterogeneous covariance among the arms. The base covariance Ξ£ follows the following configurations: DGP 1: Ξ£ = diag(π1, ..., ππ), ππ= π 0.5, DGP 2: Ξ£ = diag(π1, ..., ππ), ππ= exp( π) + πexp( π)/π, DGP 3: Ξ£ = diag(π1, ..., ππ), ππ= π 1+1/π, and DGP 4: Ξ£π, π= 0.3 and Ξ£π,π= 0.7 for π π [π]. We generate π(π) (π‘) from a centered π-dimensional Gaussian with covariance Ξ£(π), π(π) from a standard normal distribution which yields non-sparse parameter vectors, and π(π) (π‘) by the linear model (1) with the noise with variance π2 = 0.01. We compare proposal (AEt C, πΆπ= 2) to ESTC [Hao et al., 2020], Lin UCB [Li et al., 2010, Abbasi-Yadkori et al., 2011], and DR Lasso Bandit [Kim and Paik, 2019]. |