On Controllable Sparse Alternatives to Softmax
Authors: Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we present two sets of evaluations for the proposed probability mapping functions and loss functions. First, we apply them on the multilabel classification task studying the effect of varying label density in synthetic dataset, followed by evaluation on real multilabel datasets. Next, we report results of sparse attention on NLP tasks of machine translation and abstractive summarization. |
| Researcher Affiliation | Collaboration | Anirban Laha1 Saneem A. Chemmengath1 Priyanka Agrawal1 Mitesh M. Khapra2 Karthik Sankaranarayanan1 Harish G. Ramaswamy2 1 IBM Research 2 Robert Bosch Center for DS and AI, and Dept of CSE, IIT Madras |
| Pseudocode | No | The paper describes algorithms verbally (e.g., 'modified randomized median finding algorithm') but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to source code or explicitly state that the code for their methodology is released. |
| Open Datasets | Yes | We further experiment with three real datasets3 for multilabel classification: Birds, Scene and Emotions. The experimental setup and baselines are same as that for synthetic dataset described in Sec.5.1.1. For each of the datasets, we consider only those examples with atleast one label. Results are shown in Table 3 in App.A.7.2. 3Available at http://mulan.sourceforge.net/datasets-mlc.html |
| Dataset Splits | No | We tune hyperparams q for sparsehg+hinge and p0 for softmax+log using validation set. ... The paper mentions the use of a validation set but does not provide specific details on the train/validation/test splits (e.g., percentages or absolute counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | We use scikit-learn for generating synthetic datasets... on top of the Open NMT framework [18]. The paper mentions software components like scikit-learn and Open NMT but does not provide specific version numbers for them. |
| Experiment Setup | Yes | We tune hyperparams q for sparsehg+hinge and p0 for softmax+log using validation set. ... We varied only the control parameters required by our formulations. The models for the different control parameters were trained for 13 epochs and the epoch with the best validation accuracy is chosen as the best model for that setting. |