Understanding Trainable Sparse Coding with Matrix Factorization

Authors: Thomas Moreau, Joan Bruna

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 3 presents the generic architectures that will enable the usage of such schemes and the numerical experiments, which validate our analysis over a range of different scenarios.
Researcher Affiliation Academia Thomas Moreau CMLA, ENS Cachan, CNRS, Universit e Paris-Saclay, 94235 Cachan, France thomas.moreau@cmla.ens-cachan.fr Joan Bruna Courant Institute of Mathematical Sciences, New York University , New York, NY 10012, USA joan.bruna@berkeley.edu
Pseudocode No The paper describes algorithms using mathematical equations and descriptions of steps, such as (16) and (18), but does not present them in a formal pseudocode block or explicitly labeled algorithm section.
Open Source Code Yes The code to reproduce the figures is available online2. 2The code can be found at https://github.com/tom Moral/Adaptive Optim
Open Datasets Yes a highly structured dictionary composed of translation invariant Haar wavelets is used to encode 8x8 patches of images from the PASCAL VOC 2008 dataset. LISTA was used to encode MNIST images over an unconstrained dictionary
Dataset Splits No The paper mentions using training and test sets from standard datasets like PASCAL VOC 2008 and MNIST, but does not provide specific train/validation/test split percentages or details for a distinct validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No All the experiments were run using Python and Tensorflow. For all the experiments, the training is performed using Adagrad (Duchi et al., 2011). The dictionary of 100 atoms was learned from 10000 MNIST images in grayscale rescaled to 17x17 using the implementation of Mairal et al. (2009) proposed in scikit-learn.
Experiment Setup Yes The values are set to m=100, n=64 for the dictionary dimension, ρ = 5/m for the sparsity level and σ=10 for the activation coefficient generation parameters. The sparsity regularization is set to λ=0.01. The training is performed using Adagrad (Duchi et al., 2011). The architecture ΦK Θ with parameters Θ = (W (k) g , W (k) e , θ(k))k=1,...K obtained by unfolding K times the recurrent network. The network Fac Net, ΨK Θ is formed using layers such that: zk+1 = ψk Θ(zk) := AThλS 1(Azk S 1A(DTDzk DTx)) , with S diagonal and A unitary, the parameters of the k-th layer.