Learning to Edit Visual Programs with Self-Supervision
Authors: R. Kenny Jones, Renhao Zhang, Aditya Ganeshan, Daniel Ritchie
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Over multiple domains, we experimentally compare our method against the alternative of using only the one-shot model, and find that even under equal search-time budgets, our editing-based paradigm provides significant advantages. |
| Researcher Affiliation | Academia | R. Kenny Jones Brown University russell_jones@brown.edu Renhao Zhang University of Massachusetts, Amherst renhaozhang@cs.umass.edu Aditya Ganeshan Brown University aditya_ganeshan@brown.edu Daniel Ritchie Brown University daniel_ritchie@brown.edu |
| Pseudocode | Yes | Algorithm 1 Network Training |
| Open Source Code | Yes | We release code for our experiments at: https://github.com/rkjones4/VPI-Edit |
| Open Datasets | Yes | For 2D CSG we use shapes from the dataset introduced by CSGNet [34], originally sourced from Trimble 3D warehouse. For 3D CSG we use shapes from the dataset introduced by PLAD [19], originally sourced from Shape Net [4]. For the Layout domain, we use the manually designed scenes sourced from [18]. |
| Dataset Splits | Yes | In our base experiments, we use 1000/100 train/val shapes for 2D CSG (from 10000 / 3000 available) and and 1000/100 train/val shapes for 3D CSG (from 10000 / 1000 available). For the Layout domain, we use the manually designed scenes sourced from [18] (1000 train / 100 val / 144 test). |
| Hardware Specification | Yes | All of our experiments are run on NVIDIA Ge Force RTX 3090 graphic cards with 24GB of VRAM and consume up to 128GB of RAM (for 3D CSG experiments). |
| Software Dependencies | No | We implement all of our networks in Py Torch [27]. While PyTorch is mentioned, a specific version number is not provided, only a citation to its paper. |
| Experiment Setup | Yes | We use the Adam optimizer [21] with a learning rate of 1e-4. For p(z|x) pretraining we use a batch size of 128/128/64, for p(e|z, x) pretraining we use a batch size of 128/128/32, for p(z|x) finetuning we use a batch size of 20/20/20, and for p(e|z, x) finetuning we use a batch size of 128/128/32 for Layout / 2D CSG / 3D CSG domains respectively. We sample programs from p(z|x) with top-p (.9) nucleus sampling. We sample edits from p(e|z, x) with a beam search of size 3. |