Third Person Imitation Learning
Authors: Bradly C Stadie, Pieter Abbeel, Ilya Sutskever
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum. |
| Researcher Affiliation | Collaboration | 1 Open AI 2 UC Berkeley, Department of Statistics 3 UC Berkeley, Departments of EECS and ICSI |
| Pseudocode | Yes | The entire process is summarized in algorithm 1. |
| Open Source Code | Yes | Code to train a third person imitation learning agent on the domains from this paper is presented here: https://github.com/bstadie/third_person_im |
| Open Datasets | No | The paper uses environments from the MuJoCo physics simulator (pointmass, reacher, inverted pendulum) but does not provide specific access information (links, citations, or repository names) for these environments as datasets or publicly available resources. |
| Dataset Splits | No | The paper does not specify exact training, validation, and test split percentages or sample counts for any dataset used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | ADAM is used for discriminator training with a learning rate of 0.001. The RL generator uses the off-the-shelf TRPO implementation available in RLLab. While 'ADAM' and 'RLLab' are mentioned, specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | Joint Feature Extractor: Input is images are size 50 x 50 with 3 channels, RGB. Layers are 2 convolutional layers each followed by a max pooling layer of size 2. Layers use 5 filters of size 3 each. ... ADAM is used for discriminator training with a learning rate of 0.001. ... a value of 4 showed good performance over all tasks, and so this value was utilized in all other experiments. |