Training Neural Machines with Trace-Based Supervision

Authors: Matthew Mirman, Dimitar Dimitrov, Pavle Djordjevic, Timon Gehr, Martin Vechev

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed a detailed experimental evaluation with NTM and NRAM machines, showing that additional supervision on the interpretable portions of these architectures leads to better convergence and generalization capabilities of the learning phase than standard training, in both noise-free and noisy scenarios.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich, Switzerland.
Pseudocode No The paper describes machine structures and equations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes All of the code, tasks and experiments are available at: https://github.com/eth-sri/ncm
Open Datasets No The paper refers to 'algorithmic tasks (mostly from the NTM and NRAM papers)' such as 'Flip3rd', 'Swap', and 'Merge', but does not provide concrete access information (link, DOI, or specific citation with authors/year for a dataset) for the data used in these tasks.
Dataset Splits No The paper mentions using 'examples of size n' for training and testing on 'size 1.5n' and '2n' examples, and states 'A maximum of 10000 samples were used for the DNGPU and 5000 for the NRAM', but it does not specify explicit percentages or sample counts for training, validation, and test splits, nor does it explicitly mention a validation set split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No The paper mentions 'The DNGPU was run out of the box from the code supplied by the authors' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used.
Experiment Setup Yes The different supervision types are shown vertically, while the proportion of examples that receive extra subtrace supervision (density) and the extra loss term weight (λ) are shown horizontally. The best results in this case are for the read/corner type of hints for 1/2 or 1/10 of the examples, with λ {0.1, 1}.