Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning multivariate Gaussians with imperfect advice
Authors: Arnab Bhattacharyya, Davin Choo, Philips George John, Themis Gouleakis
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform two experiments on multivariate Gaussians of dimension d = 500 while varying two parameters: sparsity s [d] and advice quality q R 0. In both experiments, the difference vector µ eµ Rd is generated with random q/s values in the first s coordinates and zeros in the remaining d s coordinates. In the first experiment (see Figure 2), we fix q = 50 and vary s {100, 200, 300}. In the second experiment (see Figure 3), we fix s = 100 and vary q {0.1, 20, 30}. In both experiments, we see that TESTANDOPTIMIZE beats the empirical mean estimate in terms of incurred ℓ2 error (which translate directly to d TV), with the diminishing benefits as q or s increases. |
| Researcher Affiliation | Academia | 1University of Warwick, United Kingdom 2Harvard University, United State of America 3CNRSCREATE & National University of Singapore, Singapore 4Nanyang Technological University, Singapore. Correspondence to: Arnab Bhattacharyya <EMAIL>, Davin Choo <EMAIL>, Philips George John <EMAIL>, Themis Gouleakis <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The TESTANDOPTIMIZEMEAN algorithm. ... Algorithm 2 The TOLERANTIGMT algorithm. ... Algorithm 3 TOLERANTZMGCT. ... Algorithm 4 The APPROXL1 algorithm. ... Algorithm 5 The VECTORIZEDAPPROXL1 algorithm. ... Algorithm 6 The TESTANDOPTIMIZECOVARIANCE algorithm. |
| Open Source Code | Yes | For reproducibility, our code and scripts are provided in the supplementary materials. |
| Open Datasets | No | We perform two experiments on multivariate Gaussians of dimension d = 500 while varying two parameters: sparsity s [d] and advice quality q R 0. In both experiments, the difference vector µ eµ Rd is generated with random q/s values in the first s coordinates and zeros in the remaining d s coordinates. |
| Dataset Splits | No | The paper deals with learning properties of multivariate Gaussian distributions and generates samples from these distributions for experiments, rather than using a pre-existing dataset with predefined splits. |
| Hardware Specification | No | The paper discusses experimental results in Section 4, but does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory) used for these experiments. |
| Software Dependencies | No | For computational efficiency, we solve the LASSO optimization in its Lagrangian form bµ = argminβ Rd 1 n Pn i=1 yi β 2 2 + λ β 1, using the Lasso Lars CV method in scikit-learn, instead of the equivalent penalized form. The value of the hyperparameter λ is chosen using 5-fold cross-validation. |
| Experiment Setup | Yes | We perform two experiments on multivariate Gaussians of dimension d = 500 while varying two parameters: sparsity s [d] and advice quality q R 0. In both experiments, the difference vector µ eµ Rd is generated with random q/s values in the first s coordinates and zeros in the remaining d s coordinates. In the first experiment (see Figure 2), we fix q = 50 and vary s {100, 200, 300}. In the second experiment (see Figure 3), we fix s = 100 and vary q {0.1, 20, 30}. ... The value of the hyperparameter λ is chosen using 5-fold cross-validation. |