reproducibilityindex.ai

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Authors: Ziniu Li, Tian Xu, Zeyu Qin, Yang Yu, Zhi-Quan Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies demonstrate that our method outperforms previous state-of-the-art methods in tasks including robotic locomotion control, Atari video games, and image classification.
Researcher Affiliation	Collaboration	Ziniu Li 1,2, Tian Xu 3,4, Zeyu Qin 5, Yang Yu 3,4, and Zhi-Quan Luo 1,2 1The Chinese University of Hong Kong, Shenzhen 2Shenzhen Research Institute of Big Data 3National Key Laboratory for Novel Software Technology, Nanjing University 4Polixir.ai 5Hong Kong University of Science and Technology
Pseudocode	Yes	Algorithm 1 ISW-BC
Open Source Code	Yes	1The code is available at https://github.com/liziniu/ISWBC.
Open Datasets	Yes	We use the replay buffer data from an online DQN agent, which is publicly available at https: //console.cloud.google.com/storage/browser/atari-replay-datasets, thanks to the work of [2]. We use a famous dataset, Domain Net [36].
Dataset Splits	No	The paper specifies a train/test split for image classification ("80% for training and 20% for testing") but does not explicitly mention or detail a separate validation set split across its experiments.
Hardware Specification	Yes	The experiments are conducted on a machine comprising 48 CPU cores and 4 V100 GPU cores.
Software Dependencies	No	The paper mentions software like "rlkit codebase", "Adam optimizer", "Res Net-18 model", and "CVXPY" but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use a 2-hidden-layer multi-layer perceptron (MLP) with hidden size 256 and Re LU activation for all algorithms... We use a batch size of 256 and Adam optimizer with a learning rate of 0.0003 for training both networks. The training process is carried out for 1 million iterations. We set δ to 0 and use a gradient penalty coefficient of 8 by default.