GFlowNet-EM for Learning Compositional Latent Variable Models

Authors: Edward J Hu, Nikolay Malkin, Moksh Jain, Katie E Everett, Alexandros Graikos, Yoshua Bengio

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach, GFlow Net-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-contextfree grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.
Researcher Affiliation Collaboration 1Mila, Universit e de Montr eal 2Google Research 3Massachusetts Institute of Technology 4Stony Brook University 5CIFAR Fellow.
Pseudocode Yes Algorithm 1 GFlow Net-EM: Basic form with thresholding [...] Algorithm 2 GFlow Net-EM: E-step (sleep phase)
Open Source Code Yes Code: github.com/GFNOrg/GFlow Net-EM.
Open Datasets Yes We use a subset of Penn Tree Bank (PTB; Marcus et al., 1999) that contains sentences with 20 or fewer tokens. [...] We perform our experiments on the static MNIST dataset (Deng, 2012).
Dataset Splits Yes We use a subset of Penn Tree Bank (PTB; Marcus et al., 1999) that contains sentences with 20 or fewer tokens. Otherwise, we follow the preprocessing done by Kim et al. (2019). [...] We perform our experiments on the static MNIST dataset (Deng, 2012), with a 4 × 4 spatial latent representation and using dictionaries of sizes K ∈ {4, 8, 10} and dimensionality D = 1.
Hardware Specification Yes Grammar induction Our experiments with the context-free grammar take 23 hours to run to completion on a single V100 GPU [...] an E-step (training the GFlow Net encoder) takes approximately 25s for 400 updates, whereas the M-step (training the convolutional decoder) requires 10s for 400 updates on one A5000 GPU.
Software Dependencies Yes We use Torch-Struct (Rush, 2020) to perform marginalization and exact sampling in PCFGs.
Experiment Setup Yes Training hyperparameters are listed in Table 3. [...] We used a similar architecture as the one described in (van den Oord et al., 2017), adding batch normalization and additional downsizing and upsizing convolutional layers to obtain the smaller 4 × 4 latent representation. [...] For K= {4, 8} we trained the VQ-VAE model for 50 epochs with a learning rate of 2 × 10−4, reduced to 5 × 10−5 at epoch 25. [...] We disabled all batch normalization layers for the GFlow Net experiments and used a batch size of 128 in all our tests.