Learning to Edit Visual Programs with Self-Supervision

Authors: R. Kenny Jones, Renhao Zhang, Aditya Ganeshan, Daniel Ritchie

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages.
Researcher Affiliation Academia R. Kenny Jones Brown University russell_jones@brown.edu Renhao Zhang University of Massachusetts, Amherst renhaozhang@cs.umass.edu Aditya Ganeshan Brown University aditya_ganeshan@brown.edu Daniel Ritchie Brown University daniel_ritchie@brown.edu
Pseudocode Yes Algorithm 1 Network Training
Open Source Code Yes We release code for our experiments at: https://github.com/rkjones4/VPI-Edit
Open Datasets Yes For 2D CSG we use shapes from the dataset introduced by CSGNet [34], originally sourced from Trimble 3D warehouse. For 3D CSG we use shapes from the dataset introduced by PLAD [19], originally sourced from Shape Net [4]. For the Layout domain, we use the manually designed scenes sourced from [18].
Dataset Splits Yes In our base experiments, we use 1000/100 train/val shapes for 2D CSG (from 10000 / 3000 available) and and 1000/100 train/val shapes for 3D CSG (from 10000 / 1000 available). For the Layout domain, we use the manually designed scenes sourced from [18] (1000 train / 100 val / 144 test).
Hardware Specification Yes All of our experiments are run on NVIDIA Ge Force RTX 3090 graphic cards with 24GB of VRAM and consume up to 128GB of RAM (for 3D CSG experiments).
Software Dependencies No We implement all of our networks in Py Torch [27]. While PyTorch is mentioned, a specific version number is not provided, only a citation to its paper.
Experiment Setup Yes We use the Adam optimizer [21] with a learning rate of 1e-4. For p(z|x) pretraining we use a batch size of 128/128/64, for p(e|z, x) pretraining we use a batch size of 128/128/32, for p(z|x) finetuning we use a batch size of 20/20/20, and for p(e|z, x) finetuning we use a batch size of 128/128/32 for Layout / 2D CSG / 3D CSG domains respectively. We sample programs from p(z|x) with top-p (.9) nucleus sampling. We sample edits from p(e|z, x) with a beam search of size 3.