Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies

Authors: Seyed Kamyar Seyed Ghasemipour, Shixiang (Shane) Gu, Richard Zemel

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC. Furthermore, we observe that SMILe performs comparably or outperforms Meta-DAgger, while being applicable in the state-only setting and not requiring online experts. To our knowledge, our approach is the first efficient method for Meta-IRL that scales to the function approximator setting.
Researcher Affiliation Collaboration Seyed Kamyar Seyed Ghasemipour University of Toronto Vector Institute EMAIL Shixiang Gu Google Brain EMAIL Richard Zemel University of Toronto Vector Institute EMAIL
Pseudocode Yes The SMILe training procedure alternates between generating rollouts and updating models. In this section we present a conceptual overview of SMILe and defer exact details to Algorithm 1 in Appendix A.
Open Source Code Yes For datasets and reproducing results please refer to https://github.com/ Kamyar Gh/rl_swiss/blob/master/reproducing/smile_paper.md.
Open Datasets Yes For datasets and reproducing results please refer to https://github.com/ Kamyar Gh/rl_swiss/blob/master/reproducing/smile_paper.md. The Half Cheetah Random Velocity task is a popular baseline for meta-learning in standard RL. The meta-training set consists of 32 target positions located at every integer multiple of π/16 radians on the circle. We use 50 meta-train tasks and perform evaluations on 25 meta-test tasks.
Dataset Splits Yes The Half Cheetah Random Velocity task is a popular baseline for meta-learning in standard RL. To evaluate SMILe, we adapt this task for the Few-Shot Imitation Learning setup. Target velocities for meta-train tasks range from 0 to 3, uniformly spaced at 0.1 intervals, and meta-test tasks are defined by the range 0.05 to 2.95, uniformly spaced at 0.1 intervals. The meta-training set consists of 32 target positions located at every integer multiple of π/16 radians on the circle. The meta-testing set consists of 16 targets located at every 2nπ/32 angle on the circle. We use 50 meta-train tasks and perform evaluations on 25 meta-test tasks.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions 'simulated continuous control tasks' which does not imply specific hardware.
Software Dependencies No The paper mentions 'Mujoco benchmarks' and 'Soft-Actor-Critic [16]' (which refers to an algorithm, not a software dependency with a version), but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes The Half Cheetah Random Velocity task is a popular baseline for meta-learning in standard RL. To evaluate SMILe, we adapt this task for the Few-Shot Imitation Learning setup. Each task is defined by a target velocity that we wish a Half Cheetah agent maintain over the duration of an episode; episodes are of length 1000 and start with the agent at standstill. To obtain expert demonstrations, we train an expert policy using Soft-Actor-Critic [16] which observes as part of the state the desired target velocity. We train all models using various amounts of total expert demonstrations and evaluate on the meta-test tasks using context trajectories generated by the pre-trained expert. Results when training on 4, 16, and 64 demonstrations per meta-train task (4 random seeds per model per setting).