Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
Authors: Andre Martins, Ramon Astudillo
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We obtain promising empirical results in multi-label classification problems and in attention-based neural networks for natural language inference. For the latter, we achieve a similar performance as the traditional softmax, but with a selective, more compact, attention focus. We next evaluate empirically the ability of sparsemax for addressing two classes of problems: 1. Label proportion estimation and multi-label classification... 2. Attention-based neural networks... We ran experiments on the task of natural language inference, using the recently released SNLI 1.0 corpus (Bowman et al., 2015)... |
| Researcher Affiliation | Collaboration | Andr e F. T. Martins EMAIL Ram on F. Astudillo EMAIL Unbabel Lda, Rua Visconde de Santar em, 67-B, 1000-286 Lisboa, Portugal Instituto de Telecomunicac oes (IT), Instituto Superior T ecnico, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal Instituto de Engenharia de Sistemas e Computadores (INESC-ID), Rua Alves Redol, 9, 1000-029 Lisboa, Portugal |
| Pseudocode | Yes | Algorithm 1 Sparsemax Evaluation Input: z Sort z as z(1) . . . z(K) Find k(z) := max n k [K] | 1 + kz(k) > P Define τ(z) = ( P j k(z) z(j)) 1 k(z) Output: p s.t. pi = [zi τ(z)]+. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described in the paper is openly available. |
| Open Datasets | Yes | We ran experiments on the task of natural language inference, using the recently released SNLI 1.0 corpus (Bowman et al., 2015)... multi-label classification datasets: the four small-scale datasets used by Koyejo et al. (2015),7 and the much larger Reuters RCV1 v2 dataset of Lewis et al. (2004).8 7Obtained from http://mulan.sourceforge.net/ datasets-mlc.html. 8Obtained from https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/multilabel.html. |
| Dataset Splits | Yes | We used the provided training, development, and test splits. ...tuning the hyperparameters in a heldout validation set (for the Reuters dataset) and with 5-fold cross-validation (for the other four datasets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It mentions 'GPU-friendly' but no actual hardware used for the empirical evaluation. |
| Software Dependencies | No | The paper mentions software components and algorithms like 'Adam (Kingma & Ba, 2014)', 'Glo Ve vectors (Pennington et al., 2014)', 'gated recurrent units (GRUs, Cho et al. 2014)', 'L-BFGS (Liu & Nocedal, 1989; Nesterov, 1983)' but does not provide specific version numbers for any of these or other software dependencies. |
| Experiment Setup | Yes | We optimized all the systems with Adam (Kingma & Ba, 2014), using the default parameters β1 = 0.9, β2 = 0.999, and ϵ = 10 8, and setting the learning rate to 3 10 4. We tuned a ℓ2-regularization coefficient in {0, 10 4, 3 10 4, 10 3} and, as Rockt aschel et al. (2015), a dropout probability of 0.1 in the inputs and outputs of the network. ...for a maximum of 100 epochs... |