Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

Authors: Andreas Grivas, Antonio Vergari, Adam Lopez

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We now empirically evaluate the BSL and DFT layers on three MLC datasets and answer the following research questions: RQ1) Do BSLs have unargmaxable labels in practice? RQ2) Can DFT layers guarantee that meaningful labels are argmaxable in practice? RQ3) What is the trade-off between performance and the number of trainable parameters?
Researcher Affiliation Academia Andreas Grivas, Antonio Vergari , Adam Lopez School of Informatics, University of Edinburgh, UK {agrivas, avergari, alopez}@ed.ac.uk
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/andreasgrv/sigmoid-bottleneck.
Open Datasets Yes Clinical Coding (MIMIC-III) We first test the DFT layer on MIMIC-III (Johnson et al. 2016). For this safety critical application of clinical coding, the goal is to tag each clinical note with a set of relevant ICD-9 codes which describe findings (see Fig. 1). We retrain the CNN encoder model defined in Mullenbach et al. (2018) which has n = 8921 and e = 500. ... Semantic Indexing (Bio ASQ Task A) Next, we focus on the 2021 Bio ASQ semantic indexing challenge (Tsatsaronis et al. 2015; Nentidis et al. 2021; Krithara et al. 2023). ... Image MLC (Open Images v6) We use the Open Images v6 dataset (Kuznetsova et al. 2020)...
Dataset Splits Yes For this safety critical application of clinical coding, the goal is to tag each clinical note with a set of relevant ICD-9 codes which describe findings (see Fig. 1). We retrain the CNN encoder model defined in Mullenbach et al. (2018) which has n = 8921 and e = 500. We use the same word embeddings, preprocessed data, data splits, metrics (Prec@8) and hyperparameters reported in the paper (Mullenbach et al. 2018). ... We create dataset splits (see Appendix H.2 for details)... We use early stopping with a patience of 10 on the validation crossentropy loss.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No Moreover, they are the default for MLC in frameworks such as Scikit-learn (Pedregosa et al. 2011) and Keras (Paul and Rakshit 2014)... We use eps = 10 8 since Gurobi Optimization (2021) has a minimum tolerance of 10 9. ... We finetune Pub Med BERT (Gu et al. 2021)... The paper mentions software like Scikit-learn, Keras, Gurobi Optimization, and Pub Med BERT, but does not provide specific version numbers for them.
Experiment Setup Yes We use the same word embeddings, preprocessed data, data splits, metrics (Prec@8) and hyperparameters reported in the paper (Mullenbach et al. 2018). We only change the learning rate of the Adam optimiser to 0.001, as this improves results (as also found by Edin et al. (2023)). ... We use early stopping with a patience of 10 on the validation crossentropy loss.