State-Regularized Recurrent Neural Networks
Authors: Cheng Wang, Mathias Niepert
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our hypotheses through experiments both on synthetic and real-world datasets. We explore the improvement of the extrapolation capabilities of SR-RNNs and closely investigate their memorization behavior. |
| Researcher Affiliation | Industry | Cheng Wang 1 Mathias Niepert 1 1NEC Laboratories Europe, Heidelberg, Germany. Correspondence to: Cheng Wang <cheng.wang@neclab.eu>. |
| Pseudocode | Yes | Due to space constraints, the pseudo-code of the extraction algorithm is listed in the Supplementary Material. |
| Open Source Code | Yes | An implementation of SR-RNNs is available at https:// github.com/deepsemantic/sr-rnns. |
| Open Datasets | Yes | We evaluate the DFA extraction algorithm for SR-RNNs on RNNs trained on the Tomita grammars (Tomita, 1982)... |
| Dataset Splits | Yes | We created two datasets for BP. A large one with 22,286 training sequences (positive: 13,025; negative: 9,261) and 6,704 validation sequences (positive: 3,582; negative: 3,122). The small dataset consists of 1,008 training sequences (positive: 601; negative: 407), and 268 validation sequences (positive: 142; negative: 126). |
| Hardware Specification | Yes | All experiments were conducted on a single Titan Xp with 12G memory. |
| Software Dependencies | No | The paper mentions 'Theano' but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | Unless otherwise indicated we always (a) use single-layer RNNs, (b) learn an embedding for input tokens before feeding it to the RNNs, (c) apply ADADELTA (Zeiler, 2012) for regular language and RMSPROP (Tieleman & Hinton, 2012) with a learning rate of 0.01 and momentum of 0.9 for the rest; (d) do not use dropout (Srivastava et al., 2014) or batch normalization (Cooijmans et al., 2017) of any kind; and (e) use state-regularized RNNs based on equations 3&5 with a temperature of τ = 1 (standard softmax). |