Learn what matters: cross-domain imitation learning with task-relevant embeddings

Authors: Tim Franzmeyer, Philip Torr, João F. Henriques

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our cross-domain imitation learning approach in different cross-embodiment imitation learning scenarios, comparing on relevant benchmarks, and find that our method robustly learns policies that clearly outperform the baselines. We conduct several ablation studies, in particular finding that we can control how much domain-specific information is transferred from the expert effectively interpolating between mimicking the expert s behaviour as much as possible and finding novel policies that use different strategies to maximize the expert s reward. Section 5: Experiments.
Researcher Affiliation Academia Tim Franzmeyer University of Oxford frtim@robots.ox.ac.uk Philip H. S. Torr University of Oxford philip.torr@eng.ox.ac.uk João F. Henriques University of Oxford joao@robots.ox.ac.uk
Pseudocode No The paper describes the proposed method in Section 4 'Unsupervised Imitation Learning Across Domains' and illustrates it in Figure 1, but it does not include a formal pseudocode block or an algorithm labeled as such.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for their methodology. It mentions using 'the XIRL implementation as given by the authors' but this refers to a third-party implementation, not their own.
Open Datasets Yes We test our approach on two different benchmarks that represent multiple domains and different agents with both environment-based and agent-based tasks. Figure 2: We consider a robot learning to place an apple onto a plate from demonstrations of a human doing so. This illustrative cross-domain imitation learning problem requires finding the learner s policy L in its domain with states s L from demonstrations generated by the human expert (DE) in the distinct expert domain with states s E. We use the dataset of expert demonstrations provided by Zakka et al. [45] to compare the performance of our approach to that of the XIRL baseline. We now evaluate UDIL in the complex Mujoco environments [7, 38].
Dataset Splits No The paper mentions 'train' conceptually and discusses evaluation, but does not provide specific percentages or sample counts for training, validation, and test splits for their experiments. It refers to using DAC [20] and not altering its hyperparameters, implying standard settings from that work, but does not state *their* specific data splits.
Hardware Specification No The paper does not explicitly specify any hardware used for running the experiments, such as GPU models, CPU models, or cloud computing instances. It refers to 'robot arm', 'hopper', 'halfcheetah', 'walker' agents and 'Mujoco environments', but these are simulation environments/agents, not hardware specifications for the computing resources.
Software Dependencies No The paper mentions software environments and frameworks like 'Openai gym [7]', 'Mujoco environments [7, 38]', and 'DAC [20]', and 'XIRL implementation'. However, it does not provide specific version numbers for any of these, or for general programming languages/libraries like Python, PyTorch, or TensorFlow.
Experiment Setup No The paper states: 'We use DAC [20], to jointly train g, L and D, as depicted in Figure 1, and do not alter any hyperparameters given in the original implementation to ensure comparability.' While this indicates hyperparameters were used, they are not explicitly provided or described within this paper's text; rather, they are referenced from an external source ([20]). The prompt requires the details to be *in the paper*.