Adversarial Imitation Learning with Preferences
Authors: Aleksandar Taranovic, Andras Gabor Kupcsik, Niklas Freymuth, Gerhard Neumann
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate the effectiveness of combining both preferences and demonstrations on common benchmarks and also show that our method can efficiently learn challenging robot manipulation tasks. |
| Researcher Affiliation | Collaboration | Aleksandar Taranovic1,2 , Andras Kupcsik2, Niklas Freymuth1, Gerhard Neumann1 1 Autonomous Learning Robots Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany 2 Bosch Center for Artificial Intelligence, Renningen, Germany |
| Pseudocode | Yes | we additionally provide pseudocode in Appendix B. and The AILP algorithm with all individual steps in shown in Alg.1 below. |
| Open Source Code | No | The paper only provides a link (https://github.com/pokaxpoka/B_Pref) to the official implementation of a baseline method (Pebble) that they compare against, stating 'We use the official implementation which is contained in the same code repository1 as for (Lee et al., 2021b).'. It does not explicitly state that the source code for their proposed method (AILP) is publicly available. |
| Open Datasets | Yes | We consider 6 different manipulation tasks from the metaworld benchmark (Yu et al., 2019)... and Furthermore, we also evaluate the performance in a Mujoco task, Half Cheetah (Todorov et al., 2012). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits. It describes using expert demonstrations and generating samples for training, but does not specify how the overall dataset is partitioned into distinct validation sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cluster specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like Soft Actor-Critic (SAC) and the Adam optimizer but does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow, or specific libraries). |
| Experiment Setup | Yes | In all experiments we use the same set of 10 random seeds. and In all evaluated experiments in Section 5 we use the same parameters for SAC and those are listed in Table 2. |