High-dimensional Contextual Bandit Problem without Sparsity

Authors: Junpei Komiyama, Masaaki Imaizumi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an explore-then-commit (Et C) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of 𝑇and show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEt C) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations.
Researcher Affiliation Academia Junpei Komiyama New York University junpei@komiyama.info Masaaki Imaizumi The University of Tokyo / RIKEN Center for AIP imaizumi@g.ecc.u-tokyo.ac.jp
Pseudocode Yes Algorithm 1 Explore-then-Commit (Et C) and Algorithm 2 Adaptive Explore-then-Commit (AEt C)
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets No The paper describes Data Generating Processes (DGPs) like 'DGP 1: Ξ£ = diag(πœ†1, ..., πœ†π‘), πœ†π‘˜= π‘˜ 0.5' and states 'We generate 𝑋(𝑖) (𝑑) from a centered 𝑝-dimensional Gaussian'. This indicates synthetic data generation rather than the use of a publicly available dataset with concrete access information (link, DOI, or formal citation to an established benchmark).
Dataset Splits No The paper describes an explore-then-commit strategy, which involves an 'exploration phase' where data is collected for estimation, followed by an 'exploitation phase'. However, it does not specify explicit train/validation/test dataset splits with percentages or sample counts for model training and evaluation in a typical supervised learning context. The data is generated on-the-fly based on DGPs.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its simulations.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, that would be needed to replicate the experiments.
Experiment Setup Yes We consider two setups: (𝐾, 𝑝,𝑇) = (2, 200, 500) and (𝐾, 𝑝,𝑇) = (10, 200, 1000). For each setup, we consider a covariance matrix Ξ£(𝑖) = 𝑐(𝑖) Ξ£ where 𝑐(𝑖) Uni([0.5, 1.5]) and a base covariance Ξ£ for each 𝑖 [𝐾], which represents a heterogeneous covariance among the arms. The base covariance Ξ£ follows the following configurations: DGP 1: Ξ£ = diag(πœ†1, ..., πœ†π‘), πœ†π‘˜= π‘˜ 0.5, DGP 2: Ξ£ = diag(πœ†1, ..., πœ†π‘), πœ†π‘˜= exp( π‘˜) + 𝑇exp( 𝑇)/𝑝, DGP 3: Ξ£ = diag(πœ†1, ..., πœ†π‘), πœ†π‘˜= π‘˜ 1+1/𝑇, and DGP 4: Σ𝑖, 𝑗= 0.3 and Σ𝑖,𝑖= 0.7 for 𝑗 𝑖 [𝑝]. We generate 𝑋(𝑖) (𝑑) from a centered 𝑝-dimensional Gaussian with covariance Ξ£(𝑖), πœƒ(𝑖) from a standard normal distribution which yields non-sparse parameter vectors, and π‘Œ(𝑖) (𝑑) by the linear model (1) with the noise with variance 𝜎2 = 0.01. We compare proposal (AEt C, 𝐢𝑇= 2) to ESTC [Hao et al., 2020], Lin UCB [Li et al., 2010, Abbasi-Yadkori et al., 2011], and DR Lasso Bandit [Kim and Paik, 2019].