Generating Language Corrections for Teaching Physical Control Tasks

Authors: Megha Srivastava, Noah Goodman, Dorsa Sadigh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through both automatic and human evaluations, we show that CORGI can (i) generate valid feedback for novel student trajectories, (ii) outperform baselines on domains with novel control dynamics, and (iii) improve student learning in an interactive drawing task.
Researcher Affiliation Academia 1Department of Computer Science, Stanford University 2Department of Psychology, Stanford University.
Pseudocode Yes Algorithm 1 Train CORGI
Open Source Code Yes We include information about accessing our dataset, model checkpoints, and user study infrastructure at this link: https://github.com/Stanford-ILIAD/corgi.
Open Datasets Yes DRAWING: ...from the Omniglot dataset (Lake et al., 2015). STEERING: ...the Parking environment from Leurent (2018)... MOVEMENT: ...from the BABEL dataset (Punnakkal et al., 2021) of 3D human motion
Dataset Splits Yes We split our training dataset into train and valid splits, and use the latter to perform early stopping.
Hardware Specification Yes The trajectory encoder Mtraj,θ part of CORGI is trained for 200 epochs on one NVIDIA A40 GPU
Software Dependencies Yes The frozen LM we use is the 124M-parameter version of GPT-2 from Wolf et al. (2019a). ... partially-trained Soft Actor-Critic agents trained for only 100 epochs using the Stable Baselines3 implementation
Experiment Setup Yes The trajectory encoder Mtraj,θ part of CORGI is trained for 200 epochs on one NVIDIA A40 GPU with a batch size of 64 and learning rate of 0.05... We set the parameter n for Mtraj,θ to be 20, so the trajectory encoder outputs a set of 20 vectors with dimension 768. Mtraj,θ is a 3-layer feed-foward neural network, where each layer has an output size of n = 20 × 768.