Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Authors: Ziniu Li, Tian Xu, Zeyu Qin, Yang Yu, Zhi-Quan Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies demonstrate that our method outperforms previous state-of-the-art methods in tasks including robotic locomotion control, Atari video games, and image classification.
Researcher Affiliation Collaboration Ziniu Li 1,2, Tian Xu 3,4, Zeyu Qin 5, Yang Yu 3,4, and Zhi-Quan Luo 1,2 1The Chinese University of Hong Kong, Shenzhen 2Shenzhen Research Institute of Big Data 3National Key Laboratory for Novel Software Technology, Nanjing University 4Polixir.ai 5Hong Kong University of Science and Technology
Pseudocode Yes Algorithm 1 ISW-BC
Open Source Code Yes 1The code is available at https://github.com/liziniu/ISWBC.
Open Datasets Yes We use the replay buffer data from an online DQN agent, which is publicly available at https: //console.cloud.google.com/storage/browser/atari-replay-datasets, thanks to the work of [2]. We use a famous dataset, Domain Net [36].
Dataset Splits No The paper specifies a train/test split for image classification ("80% for training and 20% for testing") but does not explicitly mention or detail a separate validation set split across its experiments.
Hardware Specification Yes The experiments are conducted on a machine comprising 48 CPU cores and 4 V100 GPU cores.
Software Dependencies No The paper mentions software like "rlkit codebase", "Adam optimizer", "Res Net-18 model", and "CVXPY" but does not provide specific version numbers for these software components.
Experiment Setup Yes We use a 2-hidden-layer multi-layer perceptron (MLP) with hidden size 256 and Re LU activation for all algorithms... We use a batch size of 256 and Adam optimizer with a learning rate of 0.0003 for training both networks. The training process is carried out for 1 million iterations. We set δ to 0 and use a gradient penalty coefficient of 8 by default.