Poisson Variational Autoencoder
Authors: Hadi Vafaii, Dekel Galor, Jacob Yates
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the P-VAE, we perform three sets of experiments. First, we utilize the theoretical results for a linear decoder (eqs. (4) and (21)) to test the effectiveness of our reparameterization algorithm. We compare to alternative VAE models with established reparameterization tricks (e.g., Gaussian). |
| Researcher Affiliation | Academia | Hadi Vafaii1 Dekel Galor1 Jacob L. Yates1 vafaii@berkeley.edu galor@berkeley.edu yates@berkeley.edu 1UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Reparameterized sampling (rsample) for Poisson distribution. Input: λ RB K >0 # rate parameter; B, batch size; K, latent dimensionality n_exp # number of exponential samples to generate temperature # controls the sharpness of the thresholding 1: procedure RSAMPLE(λ, n_exp, temperature) 2: Exp Exponential(λ) create exponential distribution 3: t Exp.rsample((n_exp, )) sample inter-event times, t : [n_exp B K] 4: times cumsum( t, dim=0) compute arrival times, same shape as t 5: indicator sigmoid 1 times temperature soft indicator for events within unit time 6: z sum(indicator, dim=0) event counts, or number of spikes, z : [B K] 7: return z 8: end procedure |
| Open Source Code | Yes | Our code, data, and model checkpoints are available at this repository: https://github.com/hadivafaii/Poisson VAE |
| Open Datasets | Yes | For sparse coding results, we use 101 natural images from the van Hateren dataset [104]. We tile the images to extract 16 16 patches and apply whitening and contrast normalization, as is typically done in sparse coding literature [3, 105]. To test the generalizability of our sparse coding results, we repeat these steps on CIFAR10 [106], a dataset we call CIFAR16 16. For the general representation learning results, we use MNIST. |
| Dataset Splits | Yes | van Hateren: #train = 107,520, #validation = 28,224, CIFAR16 16: #train = 200,000, #validation = 40,000. We use the MNIST dataset primarily for the downstream classification task. After the training is done, we use the following train/validation split to evaluate the models: K-nearest neighbor classification (tables 4 and 6): For this task, we only make use of the validation set for both training and testing of the classifier. We divide up the N = 10,000 validation samples into two disjoint sets of N = 5,000 samples each. |
| Hardware Specification | Yes | Training all models took roughly a week on 8 RTX 6000 Ada GPUs. |
| Software Dependencies | No | We thank our anonymous reviewers for their helpful comments, and the developers of the software packages used in this project, including Py Torch [97], Num Py [122], Sci Py [123], scikit-learn [124], pandas [125], matplotlib [126], and seaborn [127]. |
| Experiment Setup | Yes | For lin|lin and conv|lin models, we used lr = 0.005, and for conv|conv models we used lr = 0.002. All models were trained using the Ada Max optimizer [148] with a cosine learning rate schedule [149]. Please see our code for the full details of training hyperparameters. Overall, we trained 195 VAE models, n = 5 seeds each, resulting in a total of 195 5 = 975 VAEs. |