Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

Authors: Bo Chang, Minmin Chen, Eldad Haber, Ed H. Chi

ICLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive simulations and experiments to demonstrate the benefits of this new RNN architecture. Antisymmetric RNN exhibits well-behaved dynamics and outperforms the regular LSTM model on tasks requiring long-term memory, and matches its performance on tasks where short-term dependencies dominate with much fewer parameters. 5 EXPERIMENTS The performance of the proposed antisymmetric networks is evaluated on four image classification tasks with long-range dependencies.
Researcher Affiliation Collaboration Bo Chang University of British Columbia Vancouver, BC, Canada EMAIL Minmin Chen Google Brain Mountain View, CA, USA EMAIL Eldad Haber University of British Columbia Vancouver, BC, Canada EMAIL Ed H. Chi Google Brain Mountain View, CA, USA EMAIL
Pseudocode No
Open Source Code No
Open Datasets Yes 5.1 PIXEL-BY-PIXEL MNIST In the first task, we learn to classify the MNIST digits by pixels (Le Cun et al., 1998). 5.2 PIXEL-BY-PIXEL CIFAR-10 The CIFAR-10 dataset contains 32 32 colour images in 10 classes (Krizhevsky & Hinton, 2009).
Dataset Splits Yes We use the standard train/test split of MNIST and CIFAR-10.
Hardware Specification No
Software Dependencies No
Experiment Setup Yes C EXPERIMENTAL DETAILS Let m be the input dimension and n be the number of hidden units. The input to hidden matrices are initialized to N(0, 1/m). The hidden to hidden matrices are initialized to N(0, σ2 w/n), where σw is chosen from σw {0, 1, 2, 4, 8, 16}. The bias terms are initialized to zero, except the forget gate bias of LSTM is initialized to 1, as suggested by Jozefowicz et al. (2015). For Antisymmetric RNNs, the step size ϵ {0.01, 0.1, 1} and diffusion γ {0.001, 0.01, 0.1, 1.0}. We use SGD with momentum and Adagrad (Duchi et al., 2011) as optimizers, with batch size of 128 and learning rate chosen from {0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1}. On MNIST and pixel-by-pixel CIFAR-10, all the models are trained for 50,000 iterations. On noise padded CIFAR-10, models are trained for 10,000 iterations.