Learning Simple Algorithms from Examples
Authors: Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on sequences far longer than those present during training. Surprisingly, we find that controllers with even modest capacity to recall previous states can easily overfit the short training sequences and not generalize to the test examples, even if the correct actions are provided. |
| Researcher Affiliation | Collaboration | Wojciech Zaremba WOJ.ZAREMBA@GMAIL.COM New York University Tomas Mikolov Armand Joulin Rob Fergus {TMIKOLOV,AJOULIN,ROBFERGUS}@FB.COM Facebook AI Research The author is now at Open AI. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper describes custom tasks and data generation for experiments, but does not provide access information (links, DOIs, repositories, or formal citations) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions training stops once 100% accuracy is achieved on "held-out examples" and when models solve "validation sequences of length 100", but it does not provide specific details on the dataset splits (e.g., exact percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, that would be needed to replicate the experiment. |
| Experiment Setup | Yes | The GRU model is trained with a batch size of 20, a learning rate of α = 0.1, using the same initialization as (Glorot & Bengio, 2010) but multiplied by 2. All tasks are trained with the same curriculum used in the supervised experiments (and in (Joulin & Mikolov, 2015)), whereby the sequences are initially of complexity 6 (corresponding to 2 or 3 digits, depending on the task) and once 100% accuracy is achieved, increased by 4 until the model is able to solve validation sequences of length 100. |