Variational Sparse Coding with Learned Thresholding
Authors: Kion Fallah, Christopher J Rozell
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation compared to other sparse distributions. We then compare to a standard variational autoencoder using a DNN generator on the Fashion MNIST and Celeb A datasets. |
| Researcher Affiliation | Academia | Kion Fallah 1 Christopher J. Rozell 1 1ML@GT, Georgia Institute of Technology, Atlanta, Georgia. Correspondence to: Kion Fallah <kion@gate.edu>. |
| Pseudocode | Yes | Algorithm 1 Training with Thresholded Samples |
| Open Source Code | Yes | 1Code available at: https://github.com/kfallah/variational-sparse-coding. |
| Open Datasets | Yes | We train on 80,000 16x16 training patches... We showcase the performance of our method compared to other inference strategies by training and analyzing a linear generator on whitened image patches (Olshausen & Field, 1996) and a DNN generator on the Fashion MNIST (Xiao et al., 2017) and Celeb A (Liu et al., 2015) datasets. |
| Dataset Splits | Yes | We train on 80,000 16x16 training patches... We train for 300 epochs using a batch size of 100. For Celeb A, we use 150,000 training samples and 19,000 validation samples. |
| Hardware Specification | Yes | We train for 300 epochs using a batch size of 512 across two Nvidia RTX 3080s. |
| Software Dependencies | Yes | Additionally, we use the automatic mixed precision (AMP) and Distributed Data Parallel implementations included in Py Torch 1.10 (Paszke et al., 2019). |
| Experiment Setup | Yes | We train for 300 epochs using a batch size of 100. Our initial learning rate for the dictionary is 5E 01 and we apply an exponential decay by a factor of 0.99 each epoch. Our inference network is trained with an initial learning rate of 1E 02, using an SGD+Nesterov optimizer with a Cycle Scheduler. We use an initial learning rate of 3E 04 using the Adam optimizer with β = (0.5, 0.999), weight decay equal to 1E 05, and a sample budget of J = 10. |