Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fighting Copycat Agents in Behavioral Cloning from Observation Histories
Authors: Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to evaluate our method against a variety of baselines. We also qualitatively study our method in order to better understand the newly introduced algorithm. (Section 6) and Table 2: Cumulative rewards per episode in partially observed (PO) environments. The top half of the table shows results in our of๏ฌine imitation setting. |
| Researcher Affiliation | Academia | Chuan Wen 1, Jierui Lin 2, Trevor Darrell2, Dinesh Jayaraman3, Yang Gao 124 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2UC Berkeley, 3University of Pennsylvania, 4Shanghai Qi Zhi Institute EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | More details and pseudocode are in Appendix. (Section 5) |
| Open Source Code | No | The paper does not include a direct statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Motivated by robotic control, we use the six Open AI Gym Mu Jo Co continuous control tasks. (Section 4) and Mu Jo Co [48] control environments from Open AI Gym (Section 6.2). |
| Dataset Splits | No | The paper mentions training on 'expert demonstrations' and evaluating on 'held-out trajectories' (Table 1), but it does not provide specific percentages or counts for training, validation, or test splits. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU, or TPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using Open AI Gym and Mu Jo Co environments, TRPO for expert generation, Adam optimizer, and neural networks (MLPs), but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set the history size H = 2 and train a neural network policy ฯฮธ(a| ot = [ot, ot 1]) on expert demonstrations. (Section 4) and In our experiments, the encoder E is a 4-layer MLP, and the decoder F and the adversary D are each 2-layer MLPs. (Section 5) |