reproducibilityindex.ai

Sample Efficient Imitation Learning for Continuous Control

Authors: Fumihiro Sasaki, Tetsuya Yohira, Atsuo Kawaguchi

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our algorithm achieves competitive results with GAIL while signiﬁcantly reducing the environment interactions.
Researcher Affiliation	Industry	Fumihiro Sasaki, Tetsuya Yohira & Atsuo Kawaguchi Ricoh Company, Ltd. {fumihiro.fs.sasaki,tetsuya.yohira,atsuo.kawaguchi}@jp.ricoh.com
Pseudocode	Yes	Algorithm 1 Overview of our IL algorithm
Open Source Code	No	The information is insufficient. The paper states, 'We use publicly available code (https://github.com/openai/imitation) for the implementation of GAIL and BC,' which refers to third-party code, not the authors' own code for their proposed algorithm.
Open Datasets	Yes	In our experiments, we aim to answer the following three questions: We use ﬁve physics-based control tasks that are simulated with Mu Jo Co physics simulator (Todorov et al., 2012). We train an agent on each task by TRPO (Schulman et al., 2015a) using the rewards deﬁned in the Open AI Gym (Brockman et al., 2016)
Dataset Splits	No	The information is insufficient. While the paper mentions 'validation rollouts' and describes 'sparse sampling setup' and 'dense sampling setup' for data generation, it does not provide explicit dataset splits (e.g., percentages or fixed counts) for training, validation, and test sets in a static dataset context.
Hardware Specification	Yes	All experiments are run on a PC with a 3.30 GHz Intel Core i7-5820k Processor, a Ge Force GTX Titan GPU, and 32GB of RAM.
Software Dependencies	No	The information is insufficient. The paper mentions using RMSProp and refers to external code for baselines, but does not provide specific software dependencies or library names with version numbers for their own implementation.
Experiment Setup	Yes	PN has 100 hidden units in each hidden layer, and its ﬁnal output is followed by hyperbolic tangent nonlinearity to bound its action range. QN has 500 hidden units in each hidden layer and a single output is followed by sigmoid nonlinearity to bound the output between [0,1]. All hidden layers are followed by leaky rectiﬁed nonlinearity (Maas et al., 2013). The parameters in all layers are initialized by Xavier initialization (Glorot & Bengio, 2010). The input of PN is given by concatenated vector representations for the state s and noise z. The noise vector, of which dimensionality corresponds to that of the state vector, generated by zero-mean normal distribution so that z Pz = N(0, 1). The input of QN is given by concatenated vector representations for the state s and action a. We employ RMSProp (Hinton et al., 2012) for learning parameters with a decay rate 0.995 and epsilon 10 8 . The learning rates are initially set to 10 3 for QN and 10 4 for PN, respectively. The target QN with parameters ν are updated so that ν = 10 3 ν+(1 10 3) ν at each update of ν. We linearly decrease the learning rates as the training proceeds. We set minibatch size of (st, at, st+1) triplets 64, the replay buffer size \|Bβ\| = 15000, and the discount factor γ = 0.85. We sample 128 noise vectors for calculating empirical expectation Ez Pz of the gradient (6).