State Aware Imitation Learning
Authors: Yannick Schroecker, Charles L. Isbell
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then evaluate our approach on a tabular domain in section 4.1, comparing our results to a purely supervised approach to imitation learning as well as to sample based inverse reinforcement learning. In section 4.2 we show that SAIL can successfully be applied to learn a neural network policy in a continuous bipedal walker domain and achieves significant improvements over supervised imitation learning in this domain. |
| Researcher Affiliation | Academia | Yannick Schroecker College of Computing Georgia Institute of Technology yannickschroecker@gatech.edu Charles Isbell College of Computing Georgia Institute of Technology isbell@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1 State Aware Imitation Learning |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code for the methodology or a link to a code repository. |
| Open Datasets | Yes | The second domain we use is a noisy variation of the bipedal walker domain found in Open AI gym[2]. |
| Dataset Splits | No | The paper describes using a set of 100 episodes from an oracle or a single successful crossing as demonstrations, and then collecting unsupervised episodes. However, it does not specify explicit training/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components such as neural networks and RMSprop, but it does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch). |
| Experiment Setup | Yes | At each iteration, 20 unsupervised sample episodes are collected to estimate the SAIL gradient, using plain stochastic gradient descent with a learning rate of 0.1 for the temporal difference update and RMSprop with a a learning rate of 0.01 for updating the policy. [...] To train the network in a purely supervised approach, we use RMSProp over 3000 epochs with a batch size of 128 frames and a learning rate of 10 5. [...] The θ log dπθ-network is trained using RMSprop with a learning rate of 10 4 whereas the policy network is trained using RMSprop and a learning rate of 10 6, starting after the first 1000 episodes. |