Programming With a Differentiable Forth Interpreter
Authors: Matko Bošnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes. In addition, we introduce neural program optimisations based on symbolic computation and parallel branching that lead to significant speed improvements. |
| Researcher Affiliation | Academia | Matko Boˇsnjak, Tim Rockt aschel, Jason Naradowsky & Sebastian Riedel Department of Computer Science University College London London, UK {m.bosnjak, t.rocktaschel, j.narad, s.riedel}@cs.ucl.ac.uk |
| Pseudocode | No | The paper includes examples of Forth code (Listing 1 and 2) and mathematical descriptions of differentiable Forth words (Table 4), but does not provide structured pseudocode or algorithm blocks for its overall methodology. |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | We test 4 on the sorting and addition tasks presented in Reed & de Freitas (2015) with varying levels of program structure. This cites the source of the *tasks*, but does not explicitly provide access information for the *datasets* used in their experiments. No specific dataset names like MNIST, CIFAR, etc., are mentioned with links or formal citations. |
| Dataset Splits | No | The paper states 'Hyperparameters were tuned via random search on a development variant of each task', implying a validation set, but does not provide specific details on dataset splits (e.g., percentages or sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions the use of Adam optimizer, but does not provide specific version numbers for any software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | The parameters of each sketch are trained using Adam (Kingma & Ba, 2014), with gradient clipping and gradient noise (Neelakantan et al., 2015b). Hyperparameters were tuned via random search on a development variant of each task, for 1000 epochs, repeating each experiment 5 times. During testing we employ memory element discretisation, replacing differentiable stacks and pointers with their discrete counterparts, and effectively allowing the trained model to generalize to any sequence length if the correct sketch behavior has been learned. To illustrate the generalization ability of this architecture, we compare against a Seq2Seq (Sutskever et al., 2014) baseline. All Seq2Seq models are single-layer, with a hidden size of 50, trained similarly for 1000 epochs using Adam. |