reproducibilityindex.ai

Learning Finite State Representations of Recurrent Policy Networks

Authors: Anurag Koul, Alan Fern, Sam Greydanus

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results of this approach on synthetic environments and six Atari games.
Researcher Affiliation	Collaboration	Anurag Koul & Alan Fern School of EECS Oregon State University Corvallis, Oregon, USA {koula,alan.fern}@oregonstate.edu Sam Greydanus Google Brain Mountain View, California, USA sgrey@google.com
Pseudocode	No	The paper describes processes and steps but does not include any blocks explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Implementations available @ https://github.com/koulanurag/gym_x
Open Datasets	Yes	We present results of this approach on synthetic environments and six Atari games... The Tomita Grammars are popular benchmarks for learning ﬁnite state machines (FSMs)... Here we evaluate our approach over the 7 Tomita Grammars... Open AI gym (Brockman et al., 2016).
Dataset Splits	No	For all of the MCEs in our experiments, the trained RNNs achieve 100% accuracy on the imitation dataset and appeared to produce optimal policies... Table 1 gives the average test score over 50 test episodes... The training dataset is comprised of an equal number of accept/reject strings with lengths uniformly sampled in the range [1,50]. Table 2 presents the test results for the trained RNNs giving the accuracy over a test set of 100 strings drawn from the same distribution as used for training. The paper describes training and testing sets, but does not explicitly describe a validation set or a 3-way split for train/validation/test.
Hardware Specification	No	The paper describes neural network architectures and training algorithms, but does not specify any particular hardware (e.g., GPU/CPU models) used for running the experiments.
Software Dependencies	No	The paper mentions using 'A3C RL algorithm' and 'Adam optimizer' but does not specify version numbers for any software libraries or dependencies.
Experiment Setup	Yes	For each MCE instance we use the following recurrent architecture: the input feeds into 1 feed-forward layer with 4 Relu6 nodes (Krizhevsky & Hinton, 2010) (ft), followed by a 1-layer GRU with 8 hidden units (ht), followed by a fully connected softmax layer giving a distribution over the M actions (one per mode)... The RNN for each grammar is comprised of a one-layer GRU with 10 hidden units, followed by a fully connected softmax layer with 2 nodes (accept/reject)... using the Adam optimizer and learning rate of 0.001... The network has 4 convolutional layers (kernel size 3, strides 2, padding 1, and 32,32,16,4 ﬁlters respectively). We used Relu as the intermediate activation and Relu6 over the last convolutional layer. This is followed by a GRU layer with 32 hidden units and a fully connected layer with n+1 units... We used the A3C RL algorithm (Mnih et al., 2016) (learning rate 10 4, discount factor 0.99) and computed loss on the policy using Generalized Advantage Estimation (λ = 1.0) (Schulman et al., 2015).