reproducibilityindex.ai

Bayesian Attention Belief Networks

Authors: Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
Researcher Affiliation	Academia	*Equal contribution 1The University of Texas at Austin 2Xidian University.
Pseudocode	No	The paper includes illustrations of model structures in figures (e.g., Figure 1, Figure 2), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured steps formatted like code.
Open Source Code	No	The paper references third-party codebases that were used in their experiments, such as 'Huggingface PyTorch Transformer (Wolf et al., 2019)' and 'Allen NLP (Gardner et al., 2017)', but it does not contain any explicit statements or links indicating that the authors' own implementation code for BABN is open-source or publicly available.
Open Datasets	Yes	Our experiments are conducted on both the General Language Understanding Evaluation (GLUE) and Stanford Question Answering (SQu AD) Datasets. For NMT we use the IWSLT dataset (Cettolo et al., 2014)... We conduct experiments on the commonly used benchmark, VQA-v2 (Goyal et al., 2017)...
Dataset Splits	Yes	The dataset is split into the training (80k images and 444k QA pairs), validation (40k images and 214k QA pairs), and testing (80k images and 448k QA pairs) sets. We perform evaluation on the validation set as the true labels for the test set are not publicly available.
Hardware Specification	No	The paper acknowledges 'the Texas Advanced Computing Center (TACC) for providing HPC resources', but it does not specify any particular GPU models (e.g., NVIDIA V100, RTX 2080 Ti), CPU models, or other detailed hardware specifications used for running their experiments.
Software Dependencies	No	The paper mentions using 'Huggingface PyTorch Transformer' and 'Text Attack (Morris et al., 2020)' but does not provide specific version numbers for these or other software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	Table 6: Experimental settings of each task for in-domain pretrained language model (LR: learning rate, BSZ: batch size, DR: dropout rate, TS: training steps, WS: warmping steps, MSL: maximum sentence length). For RoBERTa, we finetune with a maximum of 3 epochs, batch size of 32, learning rate of 1e 5, gradient clip of 1.0, and weight decay of 0.1.