Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cross-Domain Imitation Learning via Optimal Transport

Authors: Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space. Our experiments show that GWIL learns optimal behaviors with a single demonstration from another domain without any proxy tasks in non-trivial continuous control settings.
Researcher Affiliation Collaboration Arnaud Fickinger13 Samuel Cohen23 Stuart Russell1 Brandon Amos3 1Berkeley AI Research 2University College London 3Facebook AI
Pseudocode Yes Algorithm 1 Gromov-Wasserstein imitation learning from a single expert demonstration.
Open Source Code Yes 1Project site with videos and code: https://arnaudfickinger.github.io/gwil/
Open Datasets Yes To answer these three questions, we use simulated continuous control tasks implemented in Mujoco (Todorov et al., 2012) and the Deep Mind control suite (Tassa et al., 2018). We evaluate the capacity of IL methods to transfer to rigid transformation of the expert domain by using the Point Mass Maze environment from Hejna et al. (2020).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like "Mujoco", "Deep Mind control suite", and "soft actor-critic algorithm" but does not provide specific version numbers for these components.
Experiment Setup No The paper does not contain specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or detailed training configurations in the main text.