On Controllable Sparse Alternatives to Softmax

Authors: Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we present two sets of evaluations for the proposed probability mapping functions and loss functions. First, we apply them on the multilabel classification task studying the effect of varying label density in synthetic dataset, followed by evaluation on real multilabel datasets. Next, we report results of sparse attention on NLP tasks of machine translation and abstractive summarization.
Researcher Affiliation Collaboration Anirban Laha1 Saneem A. Chemmengath1 Priyanka Agrawal1 Mitesh M. Khapra2 Karthik Sankaranarayanan1 Harish G. Ramaswamy2 1 IBM Research 2 Robert Bosch Center for DS and AI, and Dept of CSE, IIT Madras
Pseudocode No The paper describes algorithms verbally (e.g., 'modified randomized median finding algorithm') but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links to source code or explicitly state that the code for their methodology is released.
Open Datasets Yes We further experiment with three real datasets3 for multilabel classification: Birds, Scene and Emotions. The experimental setup and baselines are same as that for synthetic dataset described in Sec.5.1.1. For each of the datasets, we consider only those examples with atleast one label. Results are shown in Table 3 in App.A.7.2. 3Available at http://mulan.sourceforge.net/datasets-mlc.html
Dataset Splits No We tune hyperparams q for sparsehg+hinge and p0 for softmax+log using validation set. ... The paper mentions the use of a validation set but does not provide specific details on the train/validation/test splits (e.g., percentages or absolute counts).
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No We use scikit-learn for generating synthetic datasets... on top of the Open NMT framework [18]. The paper mentions software components like scikit-learn and Open NMT but does not provide specific version numbers for them.
Experiment Setup Yes We tune hyperparams q for sparsehg+hinge and p0 for softmax+log using validation set. ... We varied only the control parameters required by our formulations. The models for the different control parameters were trained for 13 epochs and the epoch with the best validation accuracy is chosen as the best model for that setting.