State-Regularized Recurrent Neural Networks

Authors: Cheng Wang, Mathias Niepert

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our hypotheses through experiments both on synthetic and real-world datasets. We explore the improvement of the extrapolation capabilities of SR-RNNs and closely investigate their memorization behavior.
Researcher Affiliation Industry Cheng Wang 1 Mathias Niepert 1 1NEC Laboratories Europe, Heidelberg, Germany. Correspondence to: Cheng Wang <cheng.wang@neclab.eu>.
Pseudocode Yes Due to space constraints, the pseudo-code of the extraction algorithm is listed in the Supplementary Material.
Open Source Code Yes An implementation of SR-RNNs is available at https:// github.com/deepsemantic/sr-rnns.
Open Datasets Yes We evaluate the DFA extraction algorithm for SR-RNNs on RNNs trained on the Tomita grammars (Tomita, 1982)...
Dataset Splits Yes We created two datasets for BP. A large one with 22,286 training sequences (positive: 13,025; negative: 9,261) and 6,704 validation sequences (positive: 3,582; negative: 3,122). The small dataset consists of 1,008 training sequences (positive: 601; negative: 407), and 268 validation sequences (positive: 142; negative: 126).
Hardware Specification Yes All experiments were conducted on a single Titan Xp with 12G memory.
Software Dependencies No The paper mentions 'Theano' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes Unless otherwise indicated we always (a) use single-layer RNNs, (b) learn an embedding for input tokens before feeding it to the RNNs, (c) apply ADADELTA (Zeiler, 2012) for regular language and RMSPROP (Tieleman & Hinton, 2012) with a learning rate of 0.01 and momentum of 0.9 for the rest; (d) do not use dropout (Srivastava et al., 2014) or batch normalization (Cooijmans et al., 2017) of any kind; and (e) use state-regularized RNNs based on equations 3&5 with a temperature of τ = 1 (standard softmax).