Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recovery and Generalization in Over-Realized Dictionary Learning

Authors: Jeremias Sulam, Chong You, Zhihui Zhu

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work characterizes the surprising phenomenon that dictionary recovery can be facilitated by searching over the space of larger over-realized models. This observation is general and independent of the specific dictionary learning algorithm used. We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds. In particular, we show that model recovery can be upper-bounded by the empirical risk, a model-dependent quantity and the generalization gap, reflecting our empirical findings. We further show that an efficient and provably correct distillation approach can be employed to recover the correct atoms from the over-realized model. As a result, our meta-algorithm provides dictionary estimates with consistently better recovery of the ground-truth model. ... As a motivating example, we construct the following experimental setting. Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. As a learning algorithm, we employ ODL (Mairal et al., 2010) for 2000 iterations, which are more than sufficient for convergence. We employ OMP for the sparse coding step. In Figure 1a we depict the risk, or error, on both training and test sets, as a function of the number of atoms in the estimated dictionary b D, from 70 (the size of the ground-truth model) to 500. We repeat the experiment 20 times, and present the mean together with the 25% and 75% percentiles. Interestingly, both train and testing errors, shown in Figure 1a, improve with increasing dictionary size p > p within some range.
Researcher Affiliation Academia Jeremias Sulam EMAIL Department of Biomedical Engineering & Mathematical Institute for Data Science Johns Hopkins University Baltimore, MD 21205, USA Chong You EMAIL Department of Electrical Engineering & Computer Sciences University of California Berkeley, CA 94720-1776, USA Zhihui Zhu EMAIL Department of Electrical & Computer Engineering University of Denver Denver, CO 80210, USA
Pseudocode Yes Algorithm 1: Meta-algorithm for over-realized dictionary learning Data: Set of n training samples, X Rd n; dictionary size p, and sparse coding parameters (cardinality or penalty) Ī»; Initialization: Choose p > p and dictionary learning method Learn D(p,Ī»); Dictionary Learning: b D Learn D(p ,Ī») ; Distillation: e D {top p-used atoms in b D}; Result: Estimated dictionary e D
Open Source Code No Available at spams-devel.gforge.inria.fr/. Note that ODL can accommodate different formulations and algorithms for the sparse coding step (and not just an ā„“1 minimization), which will enable us to explore different experimental settings.
Open Datasets No Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics.
Dataset Splits Yes Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions 'ODL (Mairal et al., 2010)', 'OMP', 'K-SVD', and 'Lasso' as algorithms or methods used, and links to 'spams-devel.gforge.inria.fr/' for ODL. However, it does not provide specific version numbers for any software, libraries, or programming languages used in the implementation of their work.
Experiment Setup Yes Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. As a learning algorithm, we employ ODL (Mairal et al., 2010) for 2000 iterations, which are more than sufficient for convergence. We employ OMP for the sparse coding step. In Figure 1a we depict the risk, or error, on both training and test sets, as a function of the number of atoms in the estimated dictionary b D, from 70 (the size of the ground-truth model) to 500. We repeat the experiment 20 times, and present the mean together with the 25% and 75% percentiles. ... As one can see, the benefits of searching over larger model deteriorates smoothly with increasing noise, both for the general over-realized model as well as for our practical distillation approach.