Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Third Person Imitation Learning
Authors: Bradly C Stadie, Pieter Abbeel, Ilya Sutskever
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum. |
| Researcher Affiliation | Collaboration | 1 Open AI 2 UC Berkeley, Department of Statistics 3 UC Berkeley, Departments of EECS and ICSI |
| Pseudocode | Yes | The entire process is summarized in algorithm 1. |
| Open Source Code | Yes | Code to train a third person imitation learning agent on the domains from this paper is presented here: https://github.com/bstadie/third_person_im |
| Open Datasets | No | The paper uses environments from the MuJoCo physics simulator (pointmass, reacher, inverted pendulum) but does not provide specific access information (links, citations, or repository names) for these environments as datasets or publicly available resources. |
| Dataset Splits | No | The paper does not specify exact training, validation, and test split percentages or sample counts for any dataset used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | ADAM is used for discriminator training with a learning rate of 0.001. The RL generator uses the off-the-shelf TRPO implementation available in RLLab. While 'ADAM' and 'RLLab' are mentioned, specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | Joint Feature Extractor: Input is images are size 50 x 50 with 3 channels, RGB. Layers are 2 convolutional layers each followed by a max pooling layer of size 2. Layers use 5 ο¬lters of size 3 each. ... ADAM is used for discriminator training with a learning rate of 0.001. ... a value of 4 showed good performance over all tasks, and so this value was utilized in all other experiments. |