Alignment Attention by Matching Key and Query Distributions

Authors: Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. Our experiments show that the proposed alignment attention method outperforms state-of-the-art self-attentions in a wide variety of settings, including natural language understanding tasks, graph attention network, and visual question answering, in terms of accuracy and uncertainty estimation.
Researcher Affiliation Academia Shujian Zhang Xinjie Fan Huangjie Zheng Korawat Tanwisuth Mingyuan Zhou The University of Texas at Austin {szhang19, xfan, huangjie.zheng, korawat.tanwisuth}@utexas.edu mingyuan.zhou@mccombs.utexas.edu
Pseudocode No The paper describes the algorithms in prose and mathematical formulas but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/szhang42/alignment_attention
Open Datasets Yes We conduct experiments on eight benchmark datasets from General Language Understanding Evaluation (GLUE) [44] and two Stanford Question Answering Datasets (SQu AD) [45, 46]. ... Stanford Natural Language Inference (SNLI) corpus [52]... Multi-Genre Natural Language Inference (MNLI) [53]... Quora Question Pairs (QQP)... Twitter PPDB (TPPDB) [55]... Situations With Adversarial Generations (SWAG) and Hella SWAG (HSWAG) [56]... Cora, Citeseer and Pubmed [65]... VQA-v2 dataset [67]
Dataset Splits Yes For our in-domain and out-of-domain datasets, we split the development set in half to obtain a held-out, non-blind test set.
Hardware Specification No The paper mentions GPUs (Nvidia P100, V100) and training times, but these refer to other research papers' models (Vaswani et al., Strubell et al.), not the hardware used for their own experiments. It also mentions 'Texas Advanced Computing Center (TACC) for providing HPC resources' but lacks specific hardware models (e.g., exact GPU/CPU models, memory details) for their experimental setup.
Software Dependencies No The paper mentions 'Huggingface Py Torch Transformer [47]' but does not provide specific version numbers for PyTorch or any other software dependencies needed for replication.
Experiment Setup Yes We conduct experiments on the VQA-v2 dataset [67] and follow the hyperparameters and other settings from Yu et al. [68]. ... In all experiments considered in the paper, which cover various noise levels and model sizes, we have simply fixed it as 0.01.