Learning Finite State Representations of Recurrent Policy Networks
Authors: Anurag Koul, Alan Fern, Sam Greydanus
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results of this approach on synthetic environments and six Atari games. |
| Researcher Affiliation | Collaboration | Anurag Koul & Alan Fern School of EECS Oregon State University Corvallis, Oregon, USA {koula,alan.fern}@oregonstate.edu Sam Greydanus Google Brain Mountain View, California, USA sgrey@google.com |
| Pseudocode | No | The paper describes processes and steps but does not include any blocks explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Implementations available @ https://github.com/koulanurag/gym_x |
| Open Datasets | Yes | We present results of this approach on synthetic environments and six Atari games... The Tomita Grammars are popular benchmarks for learning finite state machines (FSMs)... Here we evaluate our approach over the 7 Tomita Grammars... Open AI gym (Brockman et al., 2016). |
| Dataset Splits | No | For all of the MCEs in our experiments, the trained RNNs achieve 100% accuracy on the imitation dataset and appeared to produce optimal policies... Table 1 gives the average test score over 50 test episodes... The training dataset is comprised of an equal number of accept/reject strings with lengths uniformly sampled in the range [1,50]. Table 2 presents the test results for the trained RNNs giving the accuracy over a test set of 100 strings drawn from the same distribution as used for training. The paper describes training and testing sets, but does not explicitly describe a validation set or a 3-way split for train/validation/test. |
| Hardware Specification | No | The paper describes neural network architectures and training algorithms, but does not specify any particular hardware (e.g., GPU/CPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'A3C RL algorithm' and 'Adam optimizer' but does not specify version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | For each MCE instance we use the following recurrent architecture: the input feeds into 1 feed-forward layer with 4 Relu6 nodes (Krizhevsky & Hinton, 2010) (ft), followed by a 1-layer GRU with 8 hidden units (ht), followed by a fully connected softmax layer giving a distribution over the M actions (one per mode)... The RNN for each grammar is comprised of a one-layer GRU with 10 hidden units, followed by a fully connected softmax layer with 2 nodes (accept/reject)... using the Adam optimizer and learning rate of 0.001... The network has 4 convolutional layers (kernel size 3, strides 2, padding 1, and 32,32,16,4 filters respectively). We used Relu as the intermediate activation and Relu6 over the last convolutional layer. This is followed by a GRU layer with 32 hidden units and a fully connected layer with n+1 units... We used the A3C RL algorithm (Mnih et al., 2016) (learning rate 10 4, discount factor 0.99) and computed loss on the policy using Generalized Advantage Estimation (λ = 1.0) (Schulman et al., 2015). |