On Learning Sparsely Used Dictionaries from Incomplete Samples

Authors: Thanh Nguyen, Akshay Soni, Chinmay Hegde

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments We corroborate our theory by demonstrating some representative numerical benefits of our proposed algorithms. We generate a synthetic dataset based on the generative model described in Section 2. The ground truth dictionary A is of size 256 256 with independent standard Gaussian entries. We normalize columns of A to be unit norm. Then, we generate 6-sparse code vectors x with support drawn uniformly at random. Entries in the support are sampled from 1 with equal probability. We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . Figure 1 shows our experimental results. Here, sample size refers to the number of incomplete samples. Our algorithms are able to recover the dictionary for = 0.6, 0.8, 1.0. For = 0.4, we can observe a phase transition in sample complexity of successful recovery around p = 10, 000 samples.
Researcher Affiliation Collaboration Thanh V. Nguyen 1 Akshay Soni 2 Chinmay Hegde 1 1Iowa State University 2Yahoo! Research.
Pseudocode Yes Algorithm 1 Gradient descent-style algorithm; Algorithm 2 Spectral initialization algorithm
Open Source Code Yes An implementation of our method is available online3. 3https://github.com/thanh-isu
Open Datasets No We generate a synthetic dataset based on the generative model described in Section 2. The paper does not provide access information for this synthetic dataset.
Dataset Splits No We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . No explicit train/validation/test splits are provided.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, processors, or memory used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We set the number of iterations to T = 3000 in the initialization procedure and the number of descent steps T = 50 for the descent scheme. Besides, we slightly modify the thresholding operator in the encoding step of Algorithm 1. We use another operator that keeps k largest entries of the input untouched and sets everything else to zero due to its stability.