Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Revisiting Over-smoothing in BERT from the Perspective of Graph
Authors: Han Shi, JIAHUI GAO, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James Kwok
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiment results on various data sets illustrate the effect of our fusion method. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology, 2The University of Hong Kong, 3Huawei Noah s Ark Lab, 4Sun Yat-sen University |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | Our implementation is based on the Hugging Face s Transformers library (Wolf et al., 2020). |
| Open Datasets | Yes | GLUE (Wang et al., 2018a), SWAG (Zellers et al., 2018) and SQu AD (Rajpurkar et al., 2016; 2018) data sets. |
| Dataset Splits | Yes | we take the development set data of STS-B (Cer et al., 2017), Co LA (Warstadt et al., 2019), SQu AD (Rajpurkar et al., 2016) as input to the ๏ฌne-tuned models |
| Hardware Specification | Yes | All experiments are performed on NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | Our implementation is based on the Hugging Face s Transformers library (Wolf et al., 2020). |
| Experiment Setup | Yes | The BERT model is stacked with 12 Transformer blocks (Section 2.1) with the following hyperparameters: number of tokens n = 128, number of self-attention heads h = 12, and hidden layer size d = 768. As for the feed-forward layer, we set the ๏ฌlter size dff to 3072 as in Devlin et al. (2019). ... The hyper-parameters of various downstream tasks are shown in Table 4. |