Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks
Authors: Minhyung Cho, Chandra Dhir, Jaehyung Lee
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results for offline handwriting and phoneme recognition show that an MDRNN with HF optimization performs better as the depth of the network increases up to 15 layers. |
| Researcher Affiliation | Collaboration | Minhyung Cho Chandra Shekhar Dhir Jaehyung Lee Applied Research Korea, Gracenote Inc. {mhyung.cho,shekhardhir}@gmail.com jaehyung.lee@kaist.ac.kr |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | The IFN/ENIT Database [20] is a database of handwritten Arabic words, which consists of 32,492 images. The TIMIT corpus [21] is a benchmark database for evaluating speech recognition performance. |
| Dataset Splits | Yes | The 25,955 images corresponding to the subsets (b e) were used for training. The validation set consisted of 3,269 images corresponding to the first half of the sorted list in alphabetical order (ae07 001.tif ai54 028.tif) in set a. For the TIMIT corpus, The standard training, validation, and core datasets were used. Each set contains 3,696 sentences, 400 sentences, and 192 sentences, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For HF optimization, we followed the basic setup described in [8], but different parameters were utilized. Tikhonov damping was used together with Levenberg-Marquardt heuristics. The value of the damping parameter λ was initialized to 0.1, and adjusted according to the reduction ratio ρ (multiplied by 0.9 if ρ > 0.75, divided by 0.9 if ρ < 0.25, and unchanged otherwise). For SGD optimization, the learning rate ϵ was chosen from {10 4, 10 5, 10 6}, and the momentum µ from {0.9, 0.95, 0.99}. We applied Gaussian weight noise of standard deviation σ = {0.03, 0.04, 0.05} together with L2 regularization of strength 0.001. |