Alignment Attention by Matching Key and Query Distributions
Authors: Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. Our experiments show that the proposed alignment attention method outperforms state-of-the-art self-attentions in a wide variety of settings, including natural language understanding tasks, graph attention network, and visual question answering, in terms of accuracy and uncertainty estimation. |
| Researcher Affiliation | Academia | Shujian Zhang Xinjie Fan Huangjie Zheng Korawat Tanwisuth Mingyuan Zhou The University of Texas at Austin {szhang19, xfan, huangjie.zheng, korawat.tanwisuth}@utexas.edu mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | No | The paper describes the algorithms in prose and mathematical formulas but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/szhang42/alignment_attention |
| Open Datasets | Yes | We conduct experiments on eight benchmark datasets from General Language Understanding Evaluation (GLUE) [44] and two Stanford Question Answering Datasets (SQu AD) [45, 46]. ... Stanford Natural Language Inference (SNLI) corpus [52]... Multi-Genre Natural Language Inference (MNLI) [53]... Quora Question Pairs (QQP)... Twitter PPDB (TPPDB) [55]... Situations With Adversarial Generations (SWAG) and Hella SWAG (HSWAG) [56]... Cora, Citeseer and Pubmed [65]... VQA-v2 dataset [67] |
| Dataset Splits | Yes | For our in-domain and out-of-domain datasets, we split the development set in half to obtain a held-out, non-blind test set. |
| Hardware Specification | No | The paper mentions GPUs (Nvidia P100, V100) and training times, but these refer to other research papers' models (Vaswani et al., Strubell et al.), not the hardware used for their own experiments. It also mentions 'Texas Advanced Computing Center (TACC) for providing HPC resources' but lacks specific hardware models (e.g., exact GPU/CPU models, memory details) for their experimental setup. |
| Software Dependencies | No | The paper mentions 'Huggingface Py Torch Transformer [47]' but does not provide specific version numbers for PyTorch or any other software dependencies needed for replication. |
| Experiment Setup | Yes | We conduct experiments on the VQA-v2 dataset [67] and follow the hyperparameters and other settings from Yu et al. [68]. ... In all experiments considered in the paper, which cover various noise levels and model sizes, we have simply fixed it as 0.01. |