reproducibilityindex.ai

State Aware Imitation Learning

Authors: Yannick Schroecker, Charles L. Isbell

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then evaluate our approach on a tabular domain in section 4.1, comparing our results to a purely supervised approach to imitation learning as well as to sample based inverse reinforcement learning. In section 4.2 we show that SAIL can successfully be applied to learn a neural network policy in a continuous bipedal walker domain and achieves signiﬁcant improvements over supervised imitation learning in this domain.
Researcher Affiliation	Academia	Yannick Schroecker College of Computing Georgia Institute of Technology yannickschroecker@gatech.edu Charles Isbell College of Computing Georgia Institute of Technology isbell@cc.gatech.edu
Pseudocode	Yes	Algorithm 1 State Aware Imitation Learning
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code for the methodology or a link to a code repository.
Open Datasets	Yes	The second domain we use is a noisy variation of the bipedal walker domain found in Open AI gym[2].
Dataset Splits	No	The paper describes using a set of 100 episodes from an oracle or a single successful crossing as demonstrations, and then collecting unsupervised episodes. However, it does not specify explicit training/validation/test dataset splits with percentages or counts.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components such as neural networks and RMSprop, but it does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch).
Experiment Setup	Yes	At each iteration, 20 unsupervised sample episodes are collected to estimate the SAIL gradient, using plain stochastic gradient descent with a learning rate of 0.1 for the temporal difference update and RMSprop with a a learning rate of 0.01 for updating the policy. [...] To train the network in a purely supervised approach, we use RMSProp over 3000 epochs with a batch size of 128 frames and a learning rate of 10 5. [...] The θ log dπθ-network is trained using RMSprop with a learning rate of 10 4 whereas the policy network is trained using RMSprop and a learning rate of 10 6, starting after the ﬁrst 1000 episodes.