Imitation Learning by Estimating Expertise of Demonstrators

Authors: Mark Beliaev, Andy Shih, Stefano Ermon, Dorsa Sadigh, Ramtin Pedarsani

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show the success of our method compared to standard baselines on 1) simulated datasets for grid-world, 2) human datasets for continuous control, and 3) human datasets for chess endgames. We empirically demonstrate that our learned policy outperforms policies trained without taking into account demonstrator identities, and is comparable to policies trained only on high-quality demonstrations.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of California, Santa Barbara 2Department of Computer Science, Stanford University.
Pseudocode No The paper describes its algorithms and models in text and diagrams (e.g., Figure 2) but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code Yes We provide our implementation of ILEED online (Beliaev & Shih, 2022). Beliaev, M. and Shih, A. ILEED, 6 2022. URL https: //github.com/Stanford-ILIAD/ILEED. working implementations of ILEED and BC are provided in our supplementary material, with code that can be used to reproduce the Mini Grid results.
Open Datasets Yes For the first and last experiment we rely on simulated data, using 4 Mini Grid (Chevalier-Boisvert et al., 2018) environments... For the second experiment, which relies on suboptimal human data, we use the Robomimic dataset and codebase (Mandlekar et al., 2021)... For the third experiment we derive player rankings using human chess game-ending data provided by the lichess database (Mc Ilroy-Young et al., 2020).
Dataset Splits No The paper mentions collecting data, using different combinations of subsets for Robomimic, and splitting the lichess dataset into 5 bins. It also mentions 'evaluating' performance and 'trials'. However, it does not provide explicit train/validation/test split percentages, sample counts, or a general, reproducible methodology for splitting data across all experiments.
Hardware Specification Yes All of the experiments were performed on a machine with the 8C/16T Intel-9900K CPU, 32GB RAM, and an RTX3080 GPU.
Software Dependencies No The paper mentions software like 'stable baselines 3 (Raffin et al., 2021)', 'Adam (Kingma & Ba, 2017) optimizers', 'Gym implementation (Chevalier-Boisvert et al., 2018)', and 'Robomimic dataset and codebase (Mandlekar et al., 2021)'. However, it does not provide specific version numbers for these software components, which are necessary for reproducible dependency information.
Experiment Setup Yes We list the specific hyperparameters in Table 10, where we note that we used flattened observations for all Mini Grid experiments, relying on the standard Mlp Policy class provided by stable baselines. Parameter Description Policy Class Mlp Policy Update Steps 128 Num. of Environments 8 Batch Size 4 Learning Rate 0.00025 Timesteps 200000. For training, we ran 2000 iterations utilizing two Adam (Kingma & Ba, 2017) optimizers, one for parameters θ, ϕ, and ψ with a learning rate of 1e-3, and the other for the expertise parameters ω with a smaller learning rate of 1e-2. We list these parameters in Table 11 below, noting that our implementation is provided as part of the supplementary material. The parameters for GAIL are listed separately in Table 12... For the experiments on chess, we detail the architecture and parameters in Table 13 below.