reproducibilityindex.ai

A Tensor Decomposition Perspective on Second-order RNNs

Authors: Maude Lizaire, Michael Rizvi-Martel, Marawan Gamal, Guillaume Rabusseau

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.
Researcher Affiliation	Academia	1Mila & DIRO, Universit e de Montr eal, Montreal, Canada 2CIFAR AI Chair. Correspondence to: Maude Lizaire <maude.lizaire@umontreal.ca>, Guillaume Rabusseau <grabus@iro.umontreal.ca>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	*Code base for this paper can be found at https:// github.com/Maude Liz/cprnn
Open Datasets	Yes	We perform experiments on the Penn Treebank dataset (Marcus et al., 1993) measuring bits-per-character (BPC) using the same train/valid/test partition as in Mikolov et al. (2012).
Dataset Splits	Yes	We perform experiments on the Penn Treebank dataset (Marcus et al., 1993) measuring bits-per-character (BPC) using the same train/valid/test partition as in Mikolov et al. (2012).
Hardware Specification	No	The paper mentions 'material support from NVIDIA Corporation in the form of computational resources' but does not specify the exact hardware (e.g., specific GPU or CPU models, memory details) used for the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and tanh activation function, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	All models were trained using truncated back propagation through time (Werbos, 1990) with sequence length of 50, batch size of 128 and using the Adam optimizer (P. Kingma and Ba, 2015) to minimize the negative log likelihood. Initial weights were drawn from a uniform random distribution U[ 1 n, 1 n]. For all experiments, we use the tanh activation function. For training, we use early stopping and a scheduler to reduce the learning rate (initialized at 0.001) by half on plateaus of the validation loss.