Chain of Thought Imitation with Procedure Cloning

Authors: Mengjiao (Sherry) Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert s behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert s procedure directly is infeasible.
Researcher Affiliation Collaboration Mengjiao Yang Google Brain, UC Berkeley sherryy@google.com; Dale Schuurmans Google Brain, Unversity of Alberta schuurmans@google.com; Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu; Ofir Nachum Google Brain ofirnachum@google.com
Pseudocode Yes See pseudocode for collecting procedure observations in Appendix A.2.
Open Source Code Yes https://github.com/google-research/google-research/tree/master/procedure_cloning.
Open Datasets Yes We adopt the Ant Maze environment from D4RL [82], Min Atar is a minature version of the Atari Arcade Learning Environment [85]. We generate a set of mazes S0 S and split S0 into disjoint training Strain 0 and testing Stest 0 sets. We then generate expert trajectories by running BFS on only the training set of mazes Strain 0.
Dataset Splits No The paper mentions 'early stopping' which implies the use of a validation set, but it does not provide explicit details about the split (e.g., percentages or counts) of the validation data.
Hardware Specification Yes We use 4 Nvidia V100 GPUs for each experiment unless otherwise specified.
Software Dependencies No The paper mentions 'We use JAX [87] for implementation.' but does not specify the version number for JAX or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes BC policies are parametrized with convolutional neural networks (CNN) and multi-layer perceptrons (MLPs) (see hyperparameters in Appendix A.5).