Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Recovery and Generalization in Over-Realized Dictionary Learning
Authors: Jeremias Sulam, Chong You, Zhihui Zhu
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work characterizes the surprising phenomenon that dictionary recovery can be facilitated by searching over the space of larger over-realized models. This observation is general and independent of the speciļ¬c dictionary learning algorithm used. We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds. In particular, we show that model recovery can be upper-bounded by the empirical risk, a model-dependent quantity and the generalization gap, reļ¬ecting our empirical ļ¬ndings. We further show that an eļ¬cient and provably correct distillation approach can be employed to recover the correct atoms from the over-realized model. As a result, our meta-algorithm provides dictionary estimates with consistently better recovery of the ground-truth model. ... As a motivating example, we construct the following experimental setting. Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. As a learning algorithm, we employ ODL (Mairal et al., 2010) for 2000 iterations, which are more than suļ¬cient for convergence. We employ OMP for the sparse coding step. In Figure 1a we depict the risk, or error, on both training and test sets, as a function of the number of atoms in the estimated dictionary b D, from 70 (the size of the ground-truth model) to 500. We repeat the experiment 20 times, and present the mean together with the 25% and 75% percentiles. Interestingly, both train and testing errors, shown in Figure 1a, improve with increasing dictionary size p > p within some range. |
| Researcher Affiliation | Academia | Jeremias Sulam EMAIL Department of Biomedical Engineering & Mathematical Institute for Data Science Johns Hopkins University Baltimore, MD 21205, USA Chong You EMAIL Department of Electrical Engineering & Computer Sciences University of California Berkeley, CA 94720-1776, USA Zhihui Zhu EMAIL Department of Electrical & Computer Engineering University of Denver Denver, CO 80210, USA |
| Pseudocode | Yes | Algorithm 1: Meta-algorithm for over-realized dictionary learning Data: Set of n training samples, X Rd n; dictionary size p, and sparse coding parameters (cardinality or penalty) Ī»; Initialization: Choose p > p and dictionary learning method Learn D(p,Ī»); Dictionary Learning: b D Learn D(p ,Ī») ; Distillation: e D {top p-used atoms in b D}; Result: Estimated dictionary e D |
| Open Source Code | No | Available at spams-devel.gforge.inria.fr/. Note that ODL can accommodate diļ¬erent formulations and algorithms for the sparse coding step (and not just an ā1 minimization), which will enable us to explore diļ¬erent experimental settings. |
| Open Datasets | No | Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. |
| Dataset Splits | Yes | Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'ODL (Mairal et al., 2010)', 'OMP', 'K-SVD', and 'Lasso' as algorithms or methods used, and links to 'spams-devel.gforge.inria.fr/' for ODL. However, it does not provide specific version numbers for any software, libraries, or programming languages used in the implementation of their work. |
| Experiment Setup | Yes | Data is sampled as described in the previous section from a ground-truth dictionary (with normalized Gaussian atoms) of size 50 70, from representations with cardinality k = 3. We construct 300 such samples for training, leaving 1000 to estimate the population statistics. As a learning algorithm, we employ ODL (Mairal et al., 2010) for 2000 iterations, which are more than suļ¬cient for convergence. We employ OMP for the sparse coding step. In Figure 1a we depict the risk, or error, on both training and test sets, as a function of the number of atoms in the estimated dictionary b D, from 70 (the size of the ground-truth model) to 500. We repeat the experiment 20 times, and present the mean together with the 25% and 75% percentiles. ... As one can see, the beneļ¬ts of searching over larger model deteriorates smoothly with increasing noise, both for the general over-realized model as well as for our practical distillation approach. |