reproducibilityindex.ai

On Learning Sparsely Used Dictionaries from Incomplete Samples

Authors: Thanh Nguyen, Akshay Soni, Chinmay Hegde

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We corroborate our theory by demonstrating some representative numerical beneﬁts of our proposed algorithms. We generate a synthetic dataset based on the generative model described in Section 2. The ground truth dictionary A is of size 256 256 with independent standard Gaussian entries. We normalize columns of A to be unit norm. Then, we generate 6-sparse code vectors x with support drawn uniformly at random. Entries in the support are sampled from 1 with equal probability. We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . Figure 1 shows our experimental results. Here, sample size refers to the number of incomplete samples. Our algorithms are able to recover the dictionary for = 0.6, 0.8, 1.0. For = 0.4, we can observe a phase transition in sample complexity of successful recovery around p = 10, 000 samples.
Researcher Affiliation	Collaboration	Thanh V. Nguyen 1 Akshay Soni 2 Chinmay Hegde 1 1Iowa State University 2Yahoo! Research.
Pseudocode	Yes	Algorithm 1 Gradient descent-style algorithm; Algorithm 2 Spectral initialization algorithm
Open Source Code	Yes	An implementation of our method is available online3. 3https://github.com/thanh-isu
Open Datasets	No	We generate a synthetic dataset based on the generative model described in Section 2. The paper does not provide access information for this synthetic dataset.
Dataset Splits	No	We generate all full samples, and isolate 5000 samples as side information for the initialization step. The remaining are then subsampled with different parameters . No explicit train/validation/test splits are provided.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models, processors, or memory used for running experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We set the number of iterations to T = 3000 in the initialization procedure and the number of descent steps T = 50 for the descent scheme. Besides, we slightly modify the thresholding operator in the encoding step of Algorithm 1. We use another operator that keeps k largest entries of the input untouched and sets everything else to zero due to its stability.