reproducibilityindex.ai

A Clockwork RNN

Authors: Jan Koutnik, Klaus Greff, Faustino Gomez, Juergen Schmidhuber

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The network is demonstrated in preliminary experiments involving three tasks: audio signal generation, TIMIT spoken word classiﬁcation, where it outperforms both SRN and LSTM networks, and online handwriting recognition, where it outperforms SRNs.
Researcher Affiliation	Academia	Jan Koutn ık HKOU@IDSIA.CH Klaus Greff KLAUS@IDSIA.CH Faustino Gomez TINO@IDSIA.CH J urgen Schmidhuber JUERGEN@IDSIA.CH IDSIA, USI&SUPSI, Manno-Lugano, CH-6928, Switzerland
Pseudocode	No	The paper describes the CW-RNN architecture and its calculations using equations and descriptive text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about making its source code available or provide a link to a code repository.
Open Datasets	Yes	Each sequence contains an audio signal of one spoken word from the TIMIT Speech Recognition Benchmark (Garofolo et al., 1993). The dataset (Liwicki & Bunke, 2005) consists of 5364 hand-written lines of text in the training set and 3859 lines in the test set, and two validation sets that were combined to form one validation set of size 2956.
Dataset Splits	Yes	The dataset contains 25 different words (classes) arranged in 5 groups based on their phonetic sufﬁx. For every word there are 7 examples from different speakers, which were partitioned into 5 for training and 2 for testing, for a total of 175 sequences (125 train, 50 test). The dataset (Liwicki & Bunke, 2005) consists of 5364 hand-written lines of text in the training set and 3859 lines in the test set, and two validation sets that were combined to form one validation set of size 2956.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper mentions methods like Stochastic Gradient Descent (SGD) with Nesterov-style momentum, tanh activation function, and connectionist temporal classiﬁcation (CTC), but it does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, scikit-learn versions).
Experiment Setup	Yes	Initial values for all the weights were drawn from a Gaussian distribution with zero mean and standard deviation of 0.1. Initial values of all internal state variables for all hidden activations were set to 0. Each setup was run 100 times with different random initialization of parameters. All networks were trained using Stochastic Gradient Descent (SGD) with Nesterov-style momentum (Sutskever et al., 2013). All networks used the same architecture: no inputs, one hidden layer and a single linear output neuron. Each network type was run with 4 different sizes: 100, 250, 500, and 1000 parameters. The networks were trained for 2000 epochs to minimize the mean squared error. Momentum was set to 0.95, with a learning that was optimized separately for each method, but kept the same for all network sizes: 3.0 10 4 for SRN and CW-RNN, and 3.0 10 5 for LSTM. For LSTM it was also crucial to initialize the bias of the forget gates to a high value (5.0 in this case). The hidden units of CW-RNN were divided into nine aproximately equally sized groups with exponential clock-timings {1, 2, 4, . . . , 256}.