GFlowNet-EM for Learning Compositional Latent Variable Models
Authors: Edward J Hu, Nikolay Malkin, Moksh Jain, Katie E Everett, Alexandros Graikos, Yoshua Bengio
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach, GFlow Net-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-contextfree grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder. |
| Researcher Affiliation | Collaboration | 1Mila, Universit e de Montr eal 2Google Research 3Massachusetts Institute of Technology 4Stony Brook University 5CIFAR Fellow. |
| Pseudocode | Yes | Algorithm 1 GFlow Net-EM: Basic form with thresholding [...] Algorithm 2 GFlow Net-EM: E-step (sleep phase) |
| Open Source Code | Yes | Code: github.com/GFNOrg/GFlow Net-EM. |
| Open Datasets | Yes | We use a subset of Penn Tree Bank (PTB; Marcus et al., 1999) that contains sentences with 20 or fewer tokens. [...] We perform our experiments on the static MNIST dataset (Deng, 2012). |
| Dataset Splits | Yes | We use a subset of Penn Tree Bank (PTB; Marcus et al., 1999) that contains sentences with 20 or fewer tokens. Otherwise, we follow the preprocessing done by Kim et al. (2019). [...] We perform our experiments on the static MNIST dataset (Deng, 2012), with a 4 × 4 spatial latent representation and using dictionaries of sizes K ∈ {4, 8, 10} and dimensionality D = 1. |
| Hardware Specification | Yes | Grammar induction Our experiments with the context-free grammar take 23 hours to run to completion on a single V100 GPU [...] an E-step (training the GFlow Net encoder) takes approximately 25s for 400 updates, whereas the M-step (training the convolutional decoder) requires 10s for 400 updates on one A5000 GPU. |
| Software Dependencies | Yes | We use Torch-Struct (Rush, 2020) to perform marginalization and exact sampling in PCFGs. |
| Experiment Setup | Yes | Training hyperparameters are listed in Table 3. [...] We used a similar architecture as the one described in (van den Oord et al., 2017), adding batch normalization and additional downsizing and upsizing convolutional layers to obtain the smaller 4 × 4 latent representation. [...] For K= {4, 8} we trained the VQ-VAE model for 50 epochs with a learning rate of 2 × 10−4, reduced to 5 × 10−5 at epoch 25. [...] We disabled all batch normalization layers for the GFlow Net experiments and used a batch size of 128 in all our tests. |