reproducibilityindex.ai

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

Authors: Andre Martins, Ramon Astudillo

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We obtain promising empirical results in multi-label classiﬁcation problems and in attention-based neural networks for natural language inference. For the latter, we achieve a similar performance as the traditional softmax, but with a selective, more compact, attention focus. We next evaluate empirically the ability of sparsemax for addressing two classes of problems: 1. Label proportion estimation and multi-label classiﬁcation... 2. Attention-based neural networks... We ran experiments on the task of natural language inference, using the recently released SNLI 1.0 corpus (Bowman et al., 2015)...
Researcher Affiliation	Collaboration	Andr e F. T. Martins ANDRE.MARTINS@UNBABEL.COM Ram on F. Astudillo RAMON@UNBABEL.COM Unbabel Lda, Rua Visconde de Santar em, 67-B, 1000-286 Lisboa, Portugal Instituto de Telecomunicac oes (IT), Instituto Superior T ecnico, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal Instituto de Engenharia de Sistemas e Computadores (INESC-ID), Rua Alves Redol, 9, 1000-029 Lisboa, Portugal
Pseudocode	Yes	Algorithm 1 Sparsemax Evaluation Input: z Sort z as z(1) . . . z(K) Find k(z) := max n k [K] \| 1 + kz(k) > P Deﬁne τ(z) = ( P j k(z) z(j)) 1 k(z) Output: p s.t. pi = [zi τ(z)]+.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology described in the paper is openly available.
Open Datasets	Yes	We ran experiments on the task of natural language inference, using the recently released SNLI 1.0 corpus (Bowman et al., 2015)... multi-label classiﬁcation datasets: the four small-scale datasets used by Koyejo et al. (2015),7 and the much larger Reuters RCV1 v2 dataset of Lewis et al. (2004).8 7Obtained from http://mulan.sourceforge.net/ datasets-mlc.html. 8Obtained from https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/multilabel.html.
Dataset Splits	Yes	We used the provided training, development, and test splits. ...tuning the hyperparameters in a heldout validation set (for the Reuters dataset) and with 5-fold cross-validation (for the other four datasets).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It mentions 'GPU-friendly' but no actual hardware used for the empirical evaluation.
Software Dependencies	No	The paper mentions software components and algorithms like 'Adam (Kingma & Ba, 2014)', 'Glo Ve vectors (Pennington et al., 2014)', 'gated recurrent units (GRUs, Cho et al. 2014)', 'L-BFGS (Liu & Nocedal, 1989; Nesterov, 1983)' but does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup	Yes	We optimized all the systems with Adam (Kingma & Ba, 2014), using the default parameters β1 = 0.9, β2 = 0.999, and ϵ = 10 8, and setting the learning rate to 3 10 4. We tuned a ℓ2-regularization coefﬁcient in {0, 10 4, 3 10 4, 10 3} and, as Rockt aschel et al. (2015), a dropout probability of 0.1 in the inputs and outputs of the network. ...for a maximum of 100 epochs...