TaSIL: Taylor Series Imitation Learning

Authors: Daniel Pfrommer, Thomas Zhang, Stephen Tu, Nikolai Matni

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate experimentally the relationship between the robustness of the expert policy and the order of Taylor expansion required in Ta SIL, and compare standard Behavior Cloning, DART, and DAgger with Ta SIL-loss-augmented variants. In all cases, we show significant improvement over baselines across a variety of Mu Jo Co tasks.
Researcher Affiliation Collaboration Daniel Pfrommer Massachusetts Institute of Technology Cambridge, MA dpfrom@mit.edu Thomas T.C.K. Zhang University of Pennsylvania Philadelphia, PA ttz2@seas.upenn.edu Stephen Tu Robotics at Google New York, NY stephentu@google.com Nikolai Matni University of Pennsylvania Philadelphia, PA nmatni@seas.upenn.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks with clear labels like 'Algorithm' or 'Pseudocode'.
Open Source Code Yes The code used for these experiments can be found at https://github.com/unstable-zeros/Ta SIL
Open Datasets Yes Mu Jo Co Experiments We evaluate the ability of the Ta SIL loss to improve performance on standard imitation learning tasks by modifying Behavior Cloning, DAgger [6], and DART [7] to use the ℓTa SIL,1 loss and testing them in simulation on different Open AI Gym Mu Jo Co tasks [29]. The Mu Jo Co environments we use and their corresponding (state, input) dimensions are: Walker2d-v3 (17, 6), Half Cheetah-v3 (17, 6), Humanoid-v3 (376, 17), and Ant-v3 (111, 8). [29] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
Dataset Splits No The paper mentions using 'test trajectories' for evaluation and averaging across '50 test trajectories' and '10 random seeds' in Figure 1, but it does not specify explicit training, validation, and test dataset splits by percentages or counts, nor does it describe a cross-validation setup.
Hardware Specification No All experiments are carried out using Jax [25] GPU acceleration and automatic differentiation capabilities... (Section 5). The paper mentions 'GPU acceleration' but does not specify the exact GPU model or any other hardware components like CPU or memory details.
Software Dependencies No All experiments are carried out using Jax [25] GPU acceleration and automatic differentiation capabilities and the Flax [26] neural network and Optax [27] optimization toolkits. (Section 5). The paper lists software tools used (Jax, Flax, Optax) but does not provide specific version numbers for any of them.
Experiment Setup Yes We use η = 0.95 for all experiments presented here. ... We sweep K functions γ(x) = Cxν for ν [0.05, 3], C = 5 and p-Ta SIL loss functions for p {0, 1, 2} ... For all environments we use pretrained expert policies ... The experts consist of Multi-Layer Perceptrons with two hidden layers of 256 units each and Re LU activations. For all environments, learned policies have 2 hidden layers with 512 units each and GELU activations in addition to Batch Normalization. The final policy output for both the expert and learned policy are rescaled to the valid action space after applying a tanh nonlinearity. We used trajectories of length T = 300 for all experiments.