Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Sparse Continuous Distributions and Fenchel-Young Losses
Authors: André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the usefulness of the theoretical results developed in the previous sections by running experiments with continuous attention mechanisms with several choices of βGaussian densities ( 9.1), and on heteroscedastic regression with continuous Fenchel-Young losses ( 9.2). 9.1 Continuous attention mechanisms: We test our continuous attention mechanisms on two tasks: audio classification (1-d) and visual question answering (2-d). Table 3: Results on Urban Sound8k in terms of accuracy. Table 4: Accuracies of different models on the test-dev and test-standard splits of VQA-v2. 9.2 Heteroscedastic regression with Fenchel-Young losses. Table 5: Heteroscedastic regression test r2: proportion of variance explained by β-Gaussian regression models with learned variance. |
| Researcher Affiliation | Collaboration | Andr e F. T. Martins EMAIL Instituto de Telecomunica c oes, Instituto Superior T ecnico Lisbon ELLIS Unit (LUMLIS) & Unbabel, Lisbon, Portugal; Marcos Treviso EMAIL Instituto de Telecomunica c oes, Instituto Superior T ecnico, Lisbon, Portugal; Ant onio Farinhas EMAIL Instituto de Telecomunica c oes, Instituto Superior T ecnico, Lisbon, Portugal; Pedro M. Q. Aguiar EMAIL Instituto de Sistemas e Rob otica, Instituto Superior T ecnico Lisbon ELLIS Unit (LUMLIS), Lisbon, Portugal; M ario A. T. Figueiredo EMAIL Instituto de Telecomunica c oes, Instituto Superior T ecnico Lisbon ELLIS Unit (LUMLIS), Lisbon, Portugal; Mathieu Blondel EMAIL Google Research, Paris, France; Vlad Niculae EMAIL Language Technology Lab, University of Amsterdam, The Netherlands |
| Pseudocode | Yes | Algorithm 1: Continuous softmax attention: S = RD, Ω= Ω1, Gaussian RBFs. |
| Open Source Code | Yes | To encourage reproducibility and further experimentation by the research community, we release an easy-to-use Python package alongside our paper: https://github.com/ deep-spin/sparse_continuous_distributions/. |
| Open Datasets | Yes | We use the Urban Sound8k dataset,14... https://urbansounddataset.weebly.com/. We report experiments with 2-d continuous attention on visual question answering, using the VQA-v2 dataset (Goyal et al., 2019)... We analyze the Breast Cancer Mortality and Population dataset from Rice (2006, Problem 57), accessed via statsmodels (Seabold and Perktold, 2010). |
| Dataset Splits | Yes | Since the dataset is officially split into 10 folds, we perform 10-fold cross-validation to evaluate our models. ... We used the VQA-v2 dataset (Goyal et al., 2019) with the standard splits (443K, 214K, and 453K question-image pairs for train/dev/test, the latter subdivided into test-dev, teststandard, test-challenge and test-reserve). ... We leave out the 10% most populous counties as a test set, and fit a linear model with β-Gaussian data-dependent noise |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) are mentioned in the paper. |
| Software Dependencies | No | We used Speech Brain (Ravanelli et al., 2021) to implement the input pipeline and the model... (no version specified). To encourage reproducibility and further experimentation by the research community, we release an easy-to-use Python package alongside our paper (no version specified for Python or specific libraries). |
| Experiment Setup | Yes | Table 6: Hyperparmeters for audio classification. Hyperparameter Value Batch size 16 Number of epochs 20 Optimizer Adam ℓ2 regularization 0.000002 Learning rate 0.001 Conv. filters 128 Conv. kernel size 5 Conv. activation Re LU Conv. dropout 0.15 Max-pooling size 3 Gaussian RBFs ( 8.1) 128 L with µ linearly spaced in [0, 1] and Σ = [0.1, 0.5] Ridge penalty λ 0.1 Discrete attention (Bahdanau et al., 2015) Table 7: Hyperparmeters for VQA. Hyperparameter Value Batch size 64 Word embeddings size 300 Input image features size 2048 Input question features size 512 Fused multimodal features size 1024 Multi-head attention hidden size 512 Number of MCA layers 6 Number of attention heads 8 Dropout rate 0.1 MLP size in flatten layers 512 Optimizer Adam Base learning rate at epoch t starting from 1 min(2.5t 10 5, 1 10 4) Learning rate decay ratio at epoch t {10, 12} 0.2 Number of epochs 13 |