Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data Quality in Imitation Learning
Authors: Suneel Belkhale, Yuchen Cui, Dorsa Sadigh
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the combined effect of these two key properties in imitation learning theoretically, and we empirically analyze models trained on a variety of different data sources. |
| Researcher Affiliation | Academia | Suneel Belkhale Stanford University EMAIL Yuchen Cui Stanford University EMAIL Dorsa Sadigh Stanford University EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks explicitly labeled as such were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | In Table 1, we consider single and multi-human datasets from the Square and Can tasks from robomimic [37]. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproduction. It mentions 'training' and 'test time' in the context of distribution shift and high/low data regimes, but not specific splits. |
| Hardware Specification | No | The paper does not specify the hardware used for running experiments, such as GPU or CPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions that 'BC uses an MLP architecture' and 'Transformer architecture results' but does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | We train Behavior Cloning (BC) with data generated with system noise and policy noise in two environments: PMObstacle... and Square... BC uses an MLP architecture. (Section 5.1). Also, the tables show varied noise levels (e.g., "σs = 0.01", "σp = 0.01") and episode counts ("1000 episodes", "10 episodes"). |