Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Authors: Xiaoya Li, Yuxian Meng, Mingxin Zhou, Qinghong Han, Fei Wu, Jiwei Li
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on neural machine translation, language modeling, graph representation learning and image classification, we demonstrate SAC is competitive with state-of-the-art models while significantly reducing memory cost. |
| Researcher Affiliation | Collaboration | Computer Science Department, Zhejiang University Shannon.AI EMAIL |
| Pseudocode | No | The paper describes the Edge Predictor and its steps in detail within the text and using a figure, but it does not include a formal pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | Following Vaswani et al. (2017); Ott et al. (2018); Kitaev et al. (2020), we used the standard WMT 2014 English-German dataset to test the proposed model. The dataset consists of about 4.5 million sentence pairs. Sentences are encoded using BPE (Sennrich et al., 2016), which has a shared source target vocabulary of about 37000 tokens. |
| Dataset Splits | Yes | Table 1: BLEU scores on the newstest2013 for development and newstest2014 for test for WMT English-German. and We train all models with Adam (Kingma and Ba, 2014) and early stopping on the validation set. |
| Hardware Specification | Yes | Models are run on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and tools (BPE, Stanford Dependency parser) and model architectures, but does not specify software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x', 'CUDA 11.x'). |
| Experiment Setup | Yes | For fair comparison, we used the Adam optimizer (Kingma and Ba, 2014) with β1 = 0.9, β2 = 0.98 and ϵ = 10 9 for all models. Label smoothing (Szegedy et al., 2016) with ϵ = 0.1 is applied for all models. For the base setup, following Vaswani et al. (2017), the dimensionality of inputs and outputs dmodel is set to 512, and the inner-layer has dimensionality dff is set to 2,048. |