Variational Sparse Coding with Learned Thresholding

Authors: Kion Fallah, Christopher J Rozell

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation compared to other sparse distributions. We then compare to a standard variational autoencoder using a DNN generator on the Fashion MNIST and Celeb A datasets.
Researcher Affiliation Academia Kion Fallah 1 Christopher J. Rozell 1 1ML@GT, Georgia Institute of Technology, Atlanta, Georgia. Correspondence to: Kion Fallah <kion@gate.edu>.
Pseudocode Yes Algorithm 1 Training with Thresholded Samples
Open Source Code Yes 1Code available at: https://github.com/kfallah/variational-sparse-coding.
Open Datasets Yes We train on 80,000 16x16 training patches... We showcase the performance of our method compared to other inference strategies by training and analyzing a linear generator on whitened image patches (Olshausen & Field, 1996) and a DNN generator on the Fashion MNIST (Xiao et al., 2017) and Celeb A (Liu et al., 2015) datasets.
Dataset Splits Yes We train on 80,000 16x16 training patches... We train for 300 epochs using a batch size of 100. For Celeb A, we use 150,000 training samples and 19,000 validation samples.
Hardware Specification Yes We train for 300 epochs using a batch size of 512 across two Nvidia RTX 3080s.
Software Dependencies Yes Additionally, we use the automatic mixed precision (AMP) and Distributed Data Parallel implementations included in Py Torch 1.10 (Paszke et al., 2019).
Experiment Setup Yes We train for 300 epochs using a batch size of 100. Our initial learning rate for the dictionary is 5E 01 and we apply an exponential decay by a factor of 0.99 each epoch. Our inference network is trained with an initial learning rate of 1E 02, using an SGD+Nesterov optimizer with a Cycle Scheduler. We use an initial learning rate of 3E 04 using the Adam optimizer with β = (0.5, 0.999), weight decay equal to 1E 05, and a sample budget of J = 10.