reproducibilityindex.ai

Fighting Copycat Agents in Behavioral Cloning from Observation Histories

Authors: Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to evaluate our method against a variety of baselines. We also qualitatively study our method in order to better understand the newly introduced algorithm. (Section 6) and Table 2: Cumulative rewards per episode in partially observed (PO) environments. The top half of the table shows results in our ofﬂine imitation setting.
Researcher Affiliation	Academia	Chuan Wen 1, Jierui Lin 2, Trevor Darrell2, Dinesh Jayaraman3, Yang Gao 124 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2UC Berkeley, 3University of Pennsylvania, 4Shanghai Qi Zhi Institute cwen20@mails.tsinghua.edu.cn, jerrylin0928@berkeley.edu, trevor@eecs.berkeley.edu, dineshj@seas.upenn.edu, gaoyangiiis@tsinghua.edu.cn
Pseudocode	Yes	More details and pseudocode are in Appendix. (Section 5)
Open Source Code	No	The paper does not include a direct statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Motivated by robotic control, we use the six Open AI Gym Mu Jo Co continuous control tasks. (Section 4) and Mu Jo Co [48] control environments from Open AI Gym (Section 6.2).
Dataset Splits	No	The paper mentions training on 'expert demonstrations' and evaluating on 'held-out trajectories' (Table 1), but it does not provide specific percentages or counts for training, validation, or test splits.
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU, or TPU models used for running the experiments.
Software Dependencies	No	The paper mentions using Open AI Gym and Mu Jo Co environments, TRPO for expert generation, Adam optimizer, and neural networks (MLPs), but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We set the history size H = 2 and train a neural network policy πθ(a\| ot = [ot, ot 1]) on expert demonstrations. (Section 4) and In our experiments, the encoder E is a 4-layer MLP, and the decoder F and the adversary D are each 2-layer MLPs. (Section 5)