Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

Authors: Yassir Jedra, William Réveillard, Stefan Stojanovic, Alexandre Proutiere

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental J. Numerical experiments. We perform numerical experiments on synthetic data with a uniform context distribution.
Researcher Affiliation Academia 1Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA 2Division of Decision and Control Systems, KTH, Stockholm, Sweden.
Pseudocode Yes Algorithm 1 RECOVER SUBSPACE FOR BEST POLICY IDENTIFICATION (RS-BPI)
Open Source Code Yes The code used in the experiments can be accessed at https://github.com/wilrev/Low Rank Bandits Two To Infinity.
Open Datasets No We perform numerical experiments on synthetic data with a uniform context distribution. Unless specified otherwise, the behavior policy is uniform, the target policy is chosen as the best policy: π(i) = arg maxj Mi,j (ties are broken arbitrarily), and we generate noisy entries Mit,jt + ξt where ξt N (0, 1) is standard Gaussian, and where M = PDQ for two invertible matrices P Rm m, Q Rn n, and D Rm n defined by Di,j = 1i=j1i r (note that M is consequently of rank r). P and Q are initially generated at random with uniform entries in [0, 1] and their diagonal elements are replaced by the sum of the corresponding row to ensure invertibility.
Dataset Splits No The paper uses synthetic data generated internally and describes experimental repetitions and confidence intervals, but it does not specify explicit train/validation/test dataset splits or reference any standard split methodologies for public datasets.
Hardware Specification No The paper mentions running numerical experiments but does not provide specific details about the hardware used, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or solver versions).
Experiment Setup Yes For a regularization parameter of τ = 10 4, we compare the performance of RS-PE for α {1/5, 1/2, 4/5}, where α is the proportion of samples used in the first phase of the algorithm.