On Learning Sparsely Used Dictionaries from Incomplete Samples
Authors: Thanh Nguyen, Akshay Soni, Chinmay Hegde
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We corroborate our theory by demonstrating some representative numerical beneļ¬ts of our proposed algorithms. We generate a synthetic dataset based on the generative model described in Section 2. The ground truth dictionary A is of size 256 256 with independent standard Gaussian entries. We normalize columns of A to be unit norm. Then, we generate 6-sparse code vectors x with support drawn uniformly at random. Entries in the support are sampled from 1 with equal probability. We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . Figure 1 shows our experimental results. Here, sample size refers to the number of incomplete samples. Our algorithms are able to recover the dictionary for = 0.6, 0.8, 1.0. For = 0.4, we can observe a phase transition in sample complexity of successful recovery around p = 10, 000 samples. |
| Researcher Affiliation | Collaboration | Thanh V. Nguyen 1 Akshay Soni 2 Chinmay Hegde 1 1Iowa State University 2Yahoo! Research. |
| Pseudocode | Yes | Algorithm 1 Gradient descent-style algorithm; Algorithm 2 Spectral initialization algorithm |
| Open Source Code | Yes | An implementation of our method is available online3. 3https://github.com/thanh-isu |
| Open Datasets | No | We generate a synthetic dataset based on the generative model described in Section 2. The paper does not provide access information for this synthetic dataset. |
| Dataset Splits | No | We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . No explicit train/validation/test splits are provided. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, processors, or memory used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set the number of iterations to T = 3000 in the initialization procedure and the number of descent steps T = 50 for the descent scheme. Besides, we slightly modify the thresholding operator in the encoding step of Algorithm 1. We use another operator that keeps k largest entries of the input untouched and sets everything else to zero due to its stability. |