Learning Simple Algorithms from Examples

Authors: Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on sequences far longer than those present during training. Surprisingly, we find that controllers with even modest capacity to recall previous states can easily overfit the short training sequences and not generalize to the test examples, even if the correct actions are provided.
Researcher Affiliation Collaboration Wojciech Zaremba WOJ.ZAREMBA@GMAIL.COM New York University Tomas Mikolov Armand Joulin Rob Fergus {TMIKOLOV,AJOULIN,ROBFERGUS}@FB.COM Facebook AI Research The author is now at Open AI.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets No The paper describes custom tasks and data generation for experiments, but does not provide access information (links, DOIs, repositories, or formal citations) for a publicly available or open dataset.
Dataset Splits No The paper mentions training stops once 100% accuracy is achieved on "held-out examples" and when models solve "validation sequences of length 100", but it does not provide specific details on the dataset splits (e.g., exact percentages, sample counts, or citations to predefined splits).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, that would be needed to replicate the experiment.
Experiment Setup Yes The GRU model is trained with a batch size of 20, a learning rate of α = 0.1, using the same initialization as (Glorot & Bengio, 2010) but multiplied by 2. All tasks are trained with the same curriculum used in the supervised experiments (and in (Joulin & Mikolov, 2015)), whereby the sequences are initially of complexity 6 (corresponding to 2 or 3 digits, depending on the task) and once 100% accuracy is achieved, increased by 4 until the model is able to solve validation sequences of length 100.