A Regularized Framework for Sparse and Structured Neural Attention

Authors: Vlad Niculae, Mathieu Blondel

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax.
Researcher Affiliation Collaboration Vlad Niculae Cornell University Ithaca, NY vlad@cs.cornell.edu Mathieu Blondel NTT Communication Science Laboratories Kyoto, Japan mathieu@mblondel.org
Pseudocode No The paper describes algorithms and derivations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states "We build on Open NMT-py [24], based on Py Torch [37]" and "We employ the CPU implementation provided in [31]" but does not explicitly provide a link or statement about releasing their own implemented code for the proposed mechanisms.
Open Datasets Yes We use the Stanford Natural Language Inference (SNLI) dataset [8]... we use the standard DUC 2004 dataset ... and a randomly held-out subset of Gigaword, released by [39].
Dataset Splits No The paper mentions using standard datasets and following methodologies of other papers ([31], [39]), which imply predefined splits, but does not explicitly provide the specific percentages or sample counts for training/validation/test splits within its main text for all datasets, e.g., for Gigaword, it states 'randomly held-out subset' without specifying its size.
Hardware Specification No The paper mentions using 'GPU' for Open NMT-py and 'CPU' for certain operations but does not provide specific hardware details such as GPU/CPU models or memory configurations.
Software Dependencies No The paper mentions building on 'Open NMT-py [24], based on Py Torch [37]' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes To mitigate this effect, we set the tolerance of the solver s stopping criterion to 10 2. While tuning λ may improve performance, we observe that λ = 0.1 (fusedmax) and λ = 0.01 (oscarmax) are sensible defaults that work well across all tasks and report all our results using them.