Adversarial Imitation Learning with Preferences

Authors: Aleksandar Taranovic, Andras Gabor Kupcsik, Niklas Freymuth, Gerhard Neumann

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate the effectiveness of combining both preferences and demonstrations on common benchmarks and also show that our method can efficiently learn challenging robot manipulation tasks.
Researcher Affiliation Collaboration Aleksandar Taranovic1,2 , Andras Kupcsik2, Niklas Freymuth1, Gerhard Neumann1 1 Autonomous Learning Robots Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany 2 Bosch Center for Artificial Intelligence, Renningen, Germany
Pseudocode Yes we additionally provide pseudocode in Appendix B. and The AILP algorithm with all individual steps in shown in Alg.1 below.
Open Source Code No The paper only provides a link (https://github.com/pokaxpoka/B_Pref) to the official implementation of a baseline method (Pebble) that they compare against, stating 'We use the official implementation which is contained in the same code repository1 as for (Lee et al., 2021b).'. It does not explicitly state that the source code for their proposed method (AILP) is publicly available.
Open Datasets Yes We consider 6 different manipulation tasks from the metaworld benchmark (Yu et al., 2019)... and Furthermore, we also evaluate the performance in a Mujoco task, Half Cheetah (Todorov et al., 2012).
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits. It describes using expert demonstrations and generating samples for training, but does not specify how the overall dataset is partitioned into distinct validation sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cluster specifications) used for running the experiments.
Software Dependencies No The paper mentions software components like Soft Actor-Critic (SAC) and the Adam optimizer but does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow, or specific libraries).
Experiment Setup Yes In all experiments we use the same set of 10 random seeds. and In all evaluated experiments in Section 5 we use the same parameters for SAC and those are listed in Table 2.