Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery
Authors: Yassir Jedra, William Réveillard, Stefan Stojanovic, Alexandre Proutiere
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | J. Numerical experiments. We perform numerical experiments on synthetic data with a uniform context distribution. |
| Researcher Affiliation | Academia | 1Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA 2Division of Decision and Control Systems, KTH, Stockholm, Sweden. |
| Pseudocode | Yes | Algorithm 1 RECOVER SUBSPACE FOR BEST POLICY IDENTIFICATION (RS-BPI) |
| Open Source Code | Yes | The code used in the experiments can be accessed at https://github.com/wilrev/Low Rank Bandits Two To Infinity. |
| Open Datasets | No | We perform numerical experiments on synthetic data with a uniform context distribution. Unless specified otherwise, the behavior policy is uniform, the target policy is chosen as the best policy: π(i) = arg maxj Mi,j (ties are broken arbitrarily), and we generate noisy entries Mit,jt + ξt where ξt N (0, 1) is standard Gaussian, and where M = PDQ for two invertible matrices P Rm m, Q Rn n, and D Rm n defined by Di,j = 1i=j1i r (note that M is consequently of rank r). P and Q are initially generated at random with uniform entries in [0, 1] and their diagonal elements are replaced by the sum of the corresponding row to ensure invertibility. |
| Dataset Splits | No | The paper uses synthetic data generated internally and describes experimental repetitions and confidence intervals, but it does not specify explicit train/validation/test dataset splits or reference any standard split methodologies for public datasets. |
| Hardware Specification | No | The paper mentions running numerical experiments but does not provide specific details about the hardware used, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or solver versions). |
| Experiment Setup | Yes | For a regularization parameter of τ = 10 4, we compare the performance of RS-PE for α {1/5, 1/2, 4/5}, where α is the proportion of samples used in the first phase of the algorithm. |