Flow Factorized Representation Learning
Authors: Yue Song, Andy Keller, Nicu Sebe, Max Welling
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and thorough analyses have been conducted to show the effectiveness of our method. For example, we demonstrate empirically that our representations are usefully factorized, allowing flexible composability and generalization to new datasets. Furthermore, we show that our methods are also approximately equivariant by demonstrating that they commute with input transformations through the learned latent flows. Ultimately, we see these factors combine to yield the highest likelihood on the test set in each setting. Code is publicly available at https://github.com/King James Song/latent-flow. |
| Researcher Affiliation | Collaboration | Yue Song1,2, T. Anderson Keller2, 3, Nicu Sebe1, and Max Welling2,3 1Department of Information Engineering and Computer Science, University of Trento, Italy 2Amsterdam Machine Learning Lab, University of Amsterdam, the Netherlands 3Uv A-Bosch Delta Lab, University of Amsterdam, the Netherlands yue.song@unitn.it |
| Pseudocode | Yes | Figure 9: Pytorch-like pseudo codes for training our flow-factorized VAE. |
| Open Source Code | Yes | Code is publicly available at https://github.com/King James Song/latent-flow. |
| Open Datasets | Yes | Datasets. We evaluate our method on two widely-used datasets in generative modeling, namely MNIST [54] and Shapes3D [10]. For MNIST [54], we manually construct three simple transformations including Scaling, Rotation, and Coloring. For Shapes3D [10], we use the self-contained four transformations that consist of Floor Hue, Wall Hue, Object Hue, and Scale. Besides these two common benchmarks, we take a step further to apply our method on Falcol3D and Isaac3D [61], two complex large-scale and real-world datasets that contain sequences of different transformations. |
| Dataset Splits | No | No explicit statement of training/test/validation splits (e.g., percentages or counts) was found. The paper mentions evaluating on test sets but does not specify how the data was split for training, validation, and testing. |
| Hardware Specification | Yes | All the experiments are run on a single NVIDIA Quadro RTX 6000 GPU. |
| Software Dependencies | No | The paper mentions 'Pytorch-like pseudo codes' and 'Adam optimizer' but does not specify version numbers for PyTorch or other libraries. It also mentions 'Re LU' activation and 'Gumbel-Soft Max trick' but no associated software versions. |
| Experiment Setup | Yes | Common settings. During the training stage, we randomly sample one single transformation at each iteration. The batch size is set to 128 for both datasets. We use Adam optimizer and the learning rate is set as 1e 4 for all the parameters. The encoder consists of four stacked convolution layers with the activation function Re LU, while the decoder is comprised of four stacked transposed convolution layers. For the prior evolution, the diffusion coefficient Dk is initialized with 0 and we set it as a learnable parameter for distinct k. For MLPs that parameterize the potential u(z, t) and the force f(z, t), we use the sinusoidal positional embeddings [86] to embed the timestep t, and use linear layers for embedding the latent code z. Tanh gates are applied as the activation functions of the MLPs. ... The model is trained for 90, 000 iterations. |