A Regularized Framework for Sparse and Structured Neural Attention
Authors: Vlad Niculae, Mathieu Blondel
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax. |
| Researcher Affiliation | Collaboration | Vlad Niculae Cornell University Ithaca, NY vlad@cs.cornell.edu Mathieu Blondel NTT Communication Science Laboratories Kyoto, Japan mathieu@mblondel.org |
| Pseudocode | No | The paper describes algorithms and derivations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states "We build on Open NMT-py [24], based on Py Torch [37]" and "We employ the CPU implementation provided in [31]" but does not explicitly provide a link or statement about releasing their own implemented code for the proposed mechanisms. |
| Open Datasets | Yes | We use the Stanford Natural Language Inference (SNLI) dataset [8]... we use the standard DUC 2004 dataset ... and a randomly held-out subset of Gigaword, released by [39]. |
| Dataset Splits | No | The paper mentions using standard datasets and following methodologies of other papers ([31], [39]), which imply predefined splits, but does not explicitly provide the specific percentages or sample counts for training/validation/test splits within its main text for all datasets, e.g., for Gigaword, it states 'randomly held-out subset' without specifying its size. |
| Hardware Specification | No | The paper mentions using 'GPU' for Open NMT-py and 'CPU' for certain operations but does not provide specific hardware details such as GPU/CPU models or memory configurations. |
| Software Dependencies | No | The paper mentions building on 'Open NMT-py [24], based on Py Torch [37]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | To mitigate this effect, we set the tolerance of the solver s stopping criterion to 10 2. While tuning λ may improve performance, we observe that λ = 0.1 (fusedmax) and λ = 0.01 (oscarmax) are sensible defaults that work well across all tasks and report all our results using them. |