Unbiased Online Recurrent Optimization
Authors: Corentin Tallec, Yann Ollivier
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, UORO is shown to provide convergence on a set of synthetic experiments where truncated BPTT fails to display reliable convergence. An implementation of UORO is provided as supplementary material. |
| Researcher Affiliation | Academia | Corentin Tallec Laboratoire de Recherche en Informatique Université Paris Sud Gif-sur-Yvette, 91190, France corentin.tallec@u-psud.fr Yann Ollivier Laboratoire de Recherche en Informatique Université Paris Sud Gif-sur-Yvette, 91190, France yann@yann-ollivier.org |
| Pseudocode | Yes | The resulting algorithm is detailed in Alg. 1. |
| Open Source Code | Yes | An implementation of UORO is provided as supplementary material. |
| Open Datasets | Yes | To monitor the variance of UORO s estimate over time, a 64-unit GRU recurrent network is trained on the first 107 characters of the full works of Shakespeare using UORO. |
| Dataset Splits | No | The paper describes training on sequences and evaluation, but does not specify a distinct validation set with explicit split percentages or counts for hyperparameter tuning. For example, "Optimization was performed using Adam with the default setting β1 = 0.9 and β2 = 0.999, and a decreasing learning rate ηt = γ 1+α t, with t the number of characters processed." |
| Hardware Specification | No | The paper mentions using a "64-unit GRU recurrent network" but does not specify any hardware components like CPU or GPU models, memory, or specific computing environments used for the experiments. |
| Software Dependencies | No | The paper mentions using "Adam with the default setting β1 = 0.9 and β2 = 0.999" and "vanilla SGD", but does not provide version numbers for any specific software libraries or frameworks (e.g., TensorFlow, PyTorch, Python version) that would be needed for replication. |
| Experiment Setup | Yes | Optimization was performed using Adam with the default setting β1 = 0.9 and β2 = 0.999, and a decreasing learning rate ηt = γ 1+α t, with t the number of characters processed. ... (with learning rates using α = 0.015 and γ = 10 3). ... The learning rates used α = 0.03 and γ = 10 3. |