Few-Shot Task Learning through Inverse Generative Modeling

Authors: Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Josh Tenenbaum, Tianmin Shu, Pulkit Agrawal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in five domains object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving, and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.
Researcher Affiliation Academia Aviv Netanyahu1 , Yilun Du1,2, Antonia Bronars1, Jyothish Pari1, Joshua Tenenbaum1, Tianmin Shu3, and Pulkit Agrawal1 1Massachusetts Institute of Technology 2Harvard University 3Johns Hopkins University
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No We plan to release our code in the near future.
Open Datasets Yes We test our method in a goal-oriented navigation domain adapted from the AGENT dataset [72]... In particular, we use the CMU Graphics Lab Motion Capture Database (http://mocap.cs.cmu.edu/)... In an Autonomous Driving domain [75]...
Dataset Splits No The paper describes training datasets and evaluation methods on new concepts and new initial states, but it does not explicitly specify a validation dataset split.
Hardware Specification Yes We run all simulated experiments on a single NVIDIA RTX A4000 machine. We evaluate our method on real-world table-top manipulation tasks using a Franka Research 3 robot with an overhead Realsense D435I RGB camera and an NVIDIA RTX 4090 machine.
Software Dependencies No The paper mentions software components like Adam W [86], T5 [66], and pybullet simulation [81], but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch).
Experiment Setup Yes We use an MLP with one hidden layer of size 512, ReLU activations, and Adam W [86] with learning rate 6 10-4... We set the probability of removing conditioning information, p, to 0.1... We learn two concepts with classifier-free guidance weight ω = 1.2... For the In-Context learning baseline... we use window size K = 1, and in Mo Cap K = 2.