Beyond Sub-Gaussian Measurements: High-Dimensional Structured Estimation with Sub-Exponential Designs
Authors: Vidyashankar Sivakumar, Arindam Banerjee, Pradeep K. Ravikumar
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on synthetic data to compare estimation errors for Gaussian and subexponential design matrices and noise for both ℓ1 and group sparse norms. For ℓ1 we run experiments with dimensionality p = 300 and sparsity level s = 10. For group sparse norms we run experiments with dimensionality p = 300, max. group size m = 6, number of groups NG = 50 groups each of size 6 and 4 non-zero groups. For the design matrix X, for the Gaussian case we sample rows randomly from an isotropic Gaussian distribution, while for sub-exponential design matrices we sample each row of X randomly from an isotropic extreme-value distribution. The number of samples n in X is incremented in steps of 10 with an initial starting value of 5. For the noise ω, it is sampled i.i.d from the Gaussian and extreme-value distributions with variance 1 for the Gaussian and sub-exponential cases respectively. For each sample size n, we repeat the procedure above 100 times and all results reported in the plots are average values over the 100 runs. We report two sets of results. Figure 1 shows percentage of success vs sample size for the noiseless case when y = Xθ . Figure 2 shows average estimation error vs number of samples for the noisy case when y = Xθ +ω. |
| Researcher Affiliation | Academia | Vidyashankar Sivakumar Arindam Banerjee Department of Computer Science & Engineering University of Minnesota, Twin Cities {sivakuma,banerjee}@cs.umn.edu Pradeep Ravikumar Department of Computer Science University of Texas, Austin pradeepr@cs.utexas.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper uses 'synthetic data' and describes its generation process: 'For the design matrix X, for the Gaussian case we sample rows randomly from an isotropic Gaussian distribution, while for sub-exponential design matrices we sample each row of X randomly from an isotropic extreme-value distribution. ... For the noise ω, it is sampled i.i.d from the Gaussian and extreme-value distributions with variance 1'. It does not refer to or provide access to any publicly available dataset. |
| Dataset Splits | No | The paper states 'For each sample size n, we repeat the procedure above 100 times and all results reported in the plots are average values over the 100 runs.' It does not specify explicit training, validation, or test splits by percentages or sample counts, nor does it reference standard predefined splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide any specific software dependencies with version numbers. |
| Experiment Setup | Yes | For ℓ1 we run experiments with dimensionality p = 300 and sparsity level s = 10. For group sparse norms we run experiments with dimensionality p = 300, max. group size m = 6, number of groups NG = 50 groups each of size 6 and 4 non-zero groups. For the design matrix X, for the Gaussian case we sample rows randomly from an isotropic Gaussian distribution, while for sub-exponential design matrices we sample each row of X randomly from an isotropic extreme-value distribution. The number of samples n in X is incremented in steps of 10 with an initial starting value of 5. For the noise ω, it is sampled i.i.d from the Gaussian and extreme-value distributions with variance 1 for the Gaussian and sub-exponential cases respectively. For each sample size n, we repeat the procedure above 100 times and all results reported in the plots are average values over the 100 runs. |