reproducibilityindex.ai

Chain of Thought Imitation with Procedure Cloning

Authors: Mengjiao (Sherry) Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert s behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert s procedure directly is infeasible.
Researcher Affiliation	Collaboration	Mengjiao Yang Google Brain, UC Berkeley sherryy@google.com; Dale Schuurmans Google Brain, Unversity of Alberta schuurmans@google.com; Pieter Abbeel UC Berkeley pabbeel@cs.berkeley.edu; Ofir Nachum Google Brain ofirnachum@google.com
Pseudocode	Yes	See pseudocode for collecting procedure observations in Appendix A.2.
Open Source Code	Yes	https://github.com/google-research/google-research/tree/master/procedure_cloning.
Open Datasets	Yes	We adopt the Ant Maze environment from D4RL [82], Min Atar is a minature version of the Atari Arcade Learning Environment [85]. We generate a set of mazes S0 S and split S0 into disjoint training Strain 0 and testing Stest 0 sets. We then generate expert trajectories by running BFS on only the training set of mazes Strain 0.
Dataset Splits	No	The paper mentions 'early stopping' which implies the use of a validation set, but it does not provide explicit details about the split (e.g., percentages or counts) of the validation data.
Hardware Specification	Yes	We use 4 Nvidia V100 GPUs for each experiment unless otherwise specified.
Software Dependencies	No	The paper mentions 'We use JAX [87] for implementation.' but does not specify the version number for JAX or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	BC policies are parametrized with convolutional neural networks (CNN) and multi-layer perceptrons (MLPs) (see hyperparameters in Appendix A.5).