Learning to Infer Generative Template Programs for Visual Concepts

Authors: R. Kenny Jones, Siddhartha Chaudhuri, Daniel Ritchie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments across multiple visual domains: 2D layouts, Omniglot characters, and 3D shapes. We find that our method outperforms task-specific alternatives, and performs competitively against domain-specific approaches for the limited domains where they exist.
Researcher Affiliation Collaboration R. Kenny Jones 1 Siddhartha Chaudhuri 2 Daniel Ritchie 1 1Brown University 2Adobe Research.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes We release code for our experiments at: https://github.com/rkjones4/Template Programs
Open Datasets Yes Omniglot Characters Lake et al. (2015) introduced the Omniglot dataset which contains handwritten characters from 50 languages. We source 10,000 3D shape structures from the chair, table, and storage categories of Part Net (Mo et al., 2019)
Dataset Splits Yes We divide these into 216 training-validation concepts and 168 testing concepts, where this split is designed to investigate out-of-distribution generalization performance (Section 4.5). We use the background characters for training and validation, and test on the generalization characters.
Hardware Specification Yes We run all experiments on a NVIDIA Ge Force RTX 3090 with 24GB of GPU memory, and 64 GB of RAM.
Software Dependencies No The paper mentions "Py Torch" and "Adam optimizer" but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We implement each autoregressive component of pinf with Transformer decoder models that have 8 layers, 16 heads, and a hidden dimension of 256. We use the Adam optimizer to train our networks (Kingma & Ba, 2015) with a learning rate of 1e-4. During pretraining we set the batch size to max out GPU memory, this amounts to sizes of 32 for the 2D layout domain, 40 for Omniglot domain, 32 for the shape domain with a primitive soup input and 16 for the shape domain with voxel inputs (of size 643). During fine-tuning we set the batch size to 20 for all methods, except for the shape-voxels variant, which we set to 10 to avoid maxing out VRAM.