Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Authors: Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real-world multimodal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
Researcher Affiliation Academia Rabia Gondur Fordham University rgondur@fordham.edu Usama Bin Sikandar Georgia Institute of Technology usama@gatech.edu Evan S. Schaffer Icahn School of Medicine at Mount Sinai evan.schaffer@mssm.edu Mikio Aoi University of California, San Diego maoi@ucsd.edu Stephen Keeley Fordham University skeeley1@fordham.edu
Pseudocode No The paper describes its models and methods in text and with diagrams (e.g., Figure 1a, Figure 2b) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes An implementation of MM-GPVAE can be found at: Git Hub Repository for MM-GPVAE.
Open Datasets Yes We validate our MM-GPVAE model on a simulated dataset of a smoothly rotating and scaling MNIST digit alongside simulated Poisson spike counts from 100 neurons. [...] We additionally show for comparison the 2-d depiction of the shared and independent latent representation of all trials in the dataset (Figure 19), with all five of the behaviors labelled. [...] For our hawkmoth data, the original synthetic visual stimuli were sampled at 125 Hz and the neural and torque recordings were sampled at 10K Hz (Sprayberry and Daniel, 2007; Sikandar et al., 2023; Putney et al., 2019).
Dataset Splits Yes The data was split into 80% for training and 20% for testing. For this simulated example, one latent represents an interpretable modulation of the image as it directly affects the scaling of the MNIST digit.
Hardware Specification Yes All models in this manuscript were trained end-to-end in Py Torch using the ADAM optimizer. Training was done on a Macbook Pro with Apple M1 max chip and all evaluations took less than an hour to fit.
Software Dependencies No The paper mentions software like "Py Torch" and "Deep Lab Cut", but does not provide specific version numbers for these dependencies.
Experiment Setup Yes In this work, α is set to a fixed value of 1e 2 for all experiments except for the final data analysis example, where it is set to a value of 1e 4. [...] We set the neural-independent subspace to 1-dimensional, the images-independent subspace to 1-dimensional, and an additional 1 dimension for the shared subspace. To encourage slow-evolving smooth latents in the shared and image subspaces, and faster-evolving neural latents, we initialized the length scale parameters for each latent dimension to different values. The length scale was set to 10 for the neural latents, 150 for the shared latent, and 300 for the image latent. [...] The Fourier frequency pruning was set to the minimum length scale of 10, 10, 3, and 16 for GP-VAE (simulated), MM-GPVAE (simulated), MM-GPVAE (fly), and MM-GPVAE (moth) respectively. GP length scales parameters were initialized to a value of 30 for all except for the hawkmoth evaluations (where initial values are indicated above), and jointly optimized with the ELBO. The covariance parameter α was set at a fixed value of 1e-2, 1e-2, 1e-3, and 1e-4 for GP-VAE (simulated), MM-GPVAE (simulated), MM-GPVAE (fly), and MM-GPVAE (moth) respectively. We additionally initialized the offsets d of the neural modality to the average log-rate of the neural data.