Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Revisiting Over-smoothing in BERT from the Perspective of Graph

Authors: Han Shi, JIAHUI GAO, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James Kwok

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiment results on various data sets illustrate the effect of our fusion method.
Researcher Affiliation Collaboration 1Hong Kong University of Science and Technology, 2The University of Hong Kong, 3Huawei Noah s Ark Lab, 4Sun Yat-sen University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No Our implementation is based on the Hugging Face s Transformers library (Wolf et al., 2020).
Open Datasets Yes GLUE (Wang et al., 2018a), SWAG (Zellers et al., 2018) and SQu AD (Rajpurkar et al., 2016; 2018) data sets.
Dataset Splits Yes we take the development set data of STS-B (Cer et al., 2017), Co LA (Warstadt et al., 2019), SQu AD (Rajpurkar et al., 2016) as input to the ๏ฌne-tuned models
Hardware Specification Yes All experiments are performed on NVIDIA Tesla V100 GPUs.
Software Dependencies No Our implementation is based on the Hugging Face s Transformers library (Wolf et al., 2020).
Experiment Setup Yes The BERT model is stacked with 12 Transformer blocks (Section 2.1) with the following hyperparameters: number of tokens n = 128, number of self-attention heads h = 12, and hidden layer size d = 768. As for the feed-forward layer, we set the ๏ฌlter size dff to 3072 as in Devlin et al. (2019). ... The hyper-parameters of various downstream tasks are shown in Table 4.