reproducibilityindex.ai

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Authors: Minhyung Cho, Chandra Dhir, Jaehyung Lee

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results for ofﬂine handwriting and phoneme recognition show that an MDRNN with HF optimization performs better as the depth of the network increases up to 15 layers.
Researcher Affiliation	Collaboration	Minhyung Cho Chandra Shekhar Dhir Jaehyung Lee Applied Research Korea, Gracenote Inc. {mhyung.cho,shekhardhir}@gmail.com jaehyung.lee@kaist.ac.kr
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	The IFN/ENIT Database [20] is a database of handwritten Arabic words, which consists of 32,492 images. The TIMIT corpus [21] is a benchmark database for evaluating speech recognition performance.
Dataset Splits	Yes	The 25,955 images corresponding to the subsets (b e) were used for training. The validation set consisted of 3,269 images corresponding to the ﬁrst half of the sorted list in alphabetical order (ae07 001.tif ai54 028.tif) in set a. For the TIMIT corpus, The standard training, validation, and core datasets were used. Each set contains 3,696 sentences, 400 sentences, and 192 sentences, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	For HF optimization, we followed the basic setup described in [8], but different parameters were utilized. Tikhonov damping was used together with Levenberg-Marquardt heuristics. The value of the damping parameter λ was initialized to 0.1, and adjusted according to the reduction ratio ρ (multiplied by 0.9 if ρ > 0.75, divided by 0.9 if ρ < 0.25, and unchanged otherwise). For SGD optimization, the learning rate ϵ was chosen from {10 4, 10 5, 10 6}, and the momentum µ from {0.9, 0.95, 0.99}. We applied Gaussian weight noise of standard deviation σ = {0.03, 0.04, 0.05} together with L2 regularization of strength 0.001.