Fighting Copycat Agents in Behavioral Cloning from Observation Histories
Authors: Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to evaluate our method against a variety of baselines. We also qualitatively study our method in order to better understand the newly introduced algorithm. (Section 6) and Table 2: Cumulative rewards per episode in partially observed (PO) environments. The top half of the table shows results in our offline imitation setting. |
| Researcher Affiliation | Academia | Chuan Wen 1, Jierui Lin 2, Trevor Darrell2, Dinesh Jayaraman3, Yang Gao 124 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2UC Berkeley, 3University of Pennsylvania, 4Shanghai Qi Zhi Institute cwen20@mails.tsinghua.edu.cn, jerrylin0928@berkeley.edu, trevor@eecs.berkeley.edu, dineshj@seas.upenn.edu, gaoyangiiis@tsinghua.edu.cn |
| Pseudocode | Yes | More details and pseudocode are in Appendix. (Section 5) |
| Open Source Code | No | The paper does not include a direct statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Motivated by robotic control, we use the six Open AI Gym Mu Jo Co continuous control tasks. (Section 4) and Mu Jo Co [48] control environments from Open AI Gym (Section 6.2). |
| Dataset Splits | No | The paper mentions training on 'expert demonstrations' and evaluating on 'held-out trajectories' (Table 1), but it does not provide specific percentages or counts for training, validation, or test splits. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU, or TPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using Open AI Gym and Mu Jo Co environments, TRPO for expert generation, Adam optimizer, and neural networks (MLPs), but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set the history size H = 2 and train a neural network policy πθ(a| ot = [ot, ot 1]) on expert demonstrations. (Section 4) and In our experiments, the encoder E is a 4-layer MLP, and the decoder F and the adversary D are each 2-layer MLPs. (Section 5) |