A Generative Model of Symmetry Transformations
Authors: James Allingham, Bruno Mlodozeniec, Shreyas Padhy, Javier Antorán, David Krueger, Richard Turner, Eric Nalisnick, José Miguel Hernández-Lobato
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model can be seen as a generative process for data augmentation. We provide a simple algorithm for learning our generative model and empirically demonstrate its ability to capture symmetries under affine and color transformations, in an interpretable way. Combining our symmetry model with standard generative models results in higher marginal test-log-likelihoods and improved data efficiency. |
| Researcher Affiliation | Collaboration | James Urquhart Allingham University of Cambridge jua23@cam.ac.uk Bruno Kacper Mlodozeniec University of Cambridge MPI for Intelligent Systems, Tübingen bkm28@cam.ac.uk Shreyas Padhy University of Cambridge sp2058@cam.ac.uk Javier Antorán University of Cambridge Ångstrom AI ja666@cam.ac.uk David Krueger University of Cambridge david.scott.krueger@gmail.com Richard E. Turner University of Cambridge ret26@cam.ac.uk Eric Nalisnick University of Amsterdam e.t.nalisnick@uva.nl José Miguel Hernández-Lobato University of Cambridge jmh233@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Learning |
| Open Source Code | Yes | The code is available at https://github.com/ cambridge-mlg/sgm. |
| Open Datasets | Yes | We conduct experiments using three datasets d Sprites [Matthey et al., 2017], MNIST, and Galaxy MNIST [Walmsley et al., 2022] and two kinds of transformations affine and color. |
| Dataset Splits | Yes | We split the MNIST training set by removing the last 10k examples and using them exclusively for validation and hyperparameter sweeps. ... This dataset contains 10k examples. We use the last 2k as our test set, and the previous 1k as a validation set. ... The dataset has dedicated train, test, and validation splits which we use without any modifications. |
| Hardware Specification | Yes | The experiments for this paper were performed on a cluster equipped with NVIDIA A100 GPUs. ... This work was also supported with Cloud TPUs from Google s TPU Research Cloud (TRC). |
| Software Dependencies | No | We use jax with flax for NNs, distrax for probability distributions, and optax for optimizers. We use ciclo with clu to manage our training loops, ml_collections to specify our configurations, and wandb to track our experiments. The paper lists software libraries used but does not specify version numbers for them. |
| Experiment Setup | Yes | Unless otherwise specified, we use the following NN architectures and other hyperparameters for all of our experiments. We use the Adam W optimizer with weight decay of 1 10 4, global norm gradient clipping, and a linear warm-up followed by a cosine decay as a learning rate schedule. The exact learning rates and schedules for each model are discussed below. We use a batch size of 512. ... Inference network. We use a MLP with hidden layers of dimension [2048, 1024, 512, 256]. |