Third Person Imitation Learning

Authors: Bradly C Stadie, Pieter Abbeel, Ilya Sutskever

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.
Researcher Affiliation Collaboration 1 Open AI 2 UC Berkeley, Department of Statistics 3 UC Berkeley, Departments of EECS and ICSI
Pseudocode Yes The entire process is summarized in algorithm 1.
Open Source Code Yes Code to train a third person imitation learning agent on the domains from this paper is presented here: https://github.com/bstadie/third_person_im
Open Datasets No The paper uses environments from the MuJoCo physics simulator (pointmass, reacher, inverted pendulum) but does not provide specific access information (links, citations, or repository names) for these environments as datasets or publicly available resources.
Dataset Splits No The paper does not specify exact training, validation, and test split percentages or sample counts for any dataset used.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No ADAM is used for discriminator training with a learning rate of 0.001. The RL generator uses the off-the-shelf TRPO implementation available in RLLab. While 'ADAM' and 'RLLab' are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup Yes Joint Feature Extractor: Input is images are size 50 x 50 with 3 channels, RGB. Layers are 2 convolutional layers each followed by a max pooling layer of size 2. Layers use 5 filters of size 3 each. ... ADAM is used for discriminator training with a learning rate of 0.001. ... a value of 4 showed good performance over all tasks, and so this value was utilized in all other experiments.