Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Learning Sparsely Used Dictionaries from Incomplete Samples
Authors: Thanh Nguyen, Akshay Soni, Chinmay Hegde
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We corroborate our theory by demonstrating some representative numerical bene๏ฌts of our proposed algorithms. We generate a synthetic dataset based on the generative model described in Section 2. The ground truth dictionary A is of size 256 256 with independent standard Gaussian entries. We normalize columns of A to be unit norm. Then, we generate 6-sparse code vectors x with support drawn uniformly at random. Entries in the support are sampled from 1 with equal probability. We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . Figure 1 shows our experimental results. Here, sample size refers to the number of incomplete samples. Our algorithms are able to recover the dictionary for = 0.6, 0.8, 1.0. For = 0.4, we can observe a phase transition in sample complexity of successful recovery around p = 10, 000 samples. |
| Researcher Affiliation | Collaboration | Thanh V. Nguyen 1 Akshay Soni 2 Chinmay Hegde 1 1Iowa State University 2Yahoo! Research. |
| Pseudocode | Yes | Algorithm 1 Gradient descent-style algorithm; Algorithm 2 Spectral initialization algorithm |
| Open Source Code | Yes | An implementation of our method is available online3. 3https://github.com/thanh-isu |
| Open Datasets | No | We generate a synthetic dataset based on the generative model described in Section 2. The paper does not provide access information for this synthetic dataset. |
| Dataset Splits | No | We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . No explicit train/validation/test splits are provided. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, processors, or memory used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set the number of iterations to T = 3000 in the initialization procedure and the number of descent steps T = 50 for the descent scheme. Besides, we slightly modify the thresholding operator in the encoding step of Algorithm 1. We use another operator that keeps k largest entries of the input untouched and sets everything else to zero due to its stability. |