Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Chain of Thought Imitation with Procedure Cloning
Authors: Mengjiao (Sherry) Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert s behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert s procedure directly is infeasible. |
| Researcher Affiliation | Collaboration | Mengjiao Yang Google Brain, UC Berkeley EMAIL; Dale Schuurmans Google Brain, Unversity of Alberta EMAIL; Pieter Abbeel UC Berkeley EMAIL; Ofir Nachum Google Brain EMAIL |
| Pseudocode | Yes | See pseudocode for collecting procedure observations in Appendix A.2. |
| Open Source Code | Yes | https://github.com/google-research/google-research/tree/master/procedure_cloning. |
| Open Datasets | Yes | We adopt the Ant Maze environment from D4RL [82], Min Atar is a minature version of the Atari Arcade Learning Environment [85]. We generate a set of mazes S0 S and split S0 into disjoint training Strain 0 and testing Stest 0 sets. We then generate expert trajectories by running BFS on only the training set of mazes Strain 0. |
| Dataset Splits | No | The paper mentions 'early stopping' which implies the use of a validation set, but it does not provide explicit details about the split (e.g., percentages or counts) of the validation data. |
| Hardware Specification | Yes | We use 4 Nvidia V100 GPUs for each experiment unless otherwise specified. |
| Software Dependencies | No | The paper mentions 'We use JAX [87] for implementation.' but does not specify the version number for JAX or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | BC policies are parametrized with convolutional neural networks (CNN) and multi-layer perceptrons (MLPs) (see hyperparameters in Appendix A.5). |