reproducibilityindex.ai

IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

Authors: Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren14621-14629

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues... We also propose a new network regularization method, isotropic batch normalization (Iso BN) to address the issues... This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.
Researcher Affiliation	Academia	Wenxuan Zhou, Bill Yuchen Lin, Xiang Ren Department of Computer Science, University of Southern California, Los Angeles, CA {zhouwenx, yuchen.lin, xiangren}@usc.edu
Pseudocode	Yes	The whole algorithm of Iso BN is shown in Algorithm 1.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We evaluate Iso BN on two PTLMs (BERT-base-cased and RoBERTa-large) and seven NLU tasks from the GLUE benckmark (Wang et al. 2019b).
Dataset Splits	Yes	We apply early stopping according to task-specific metrics on the dev set. We select the best combination of hyperparameters on the dev set. We fine-tune the PTLMs with 5 different random seeds and report the median and standard deviation of metrics on the dev set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	Our implementation of PTLMs is based on Hugging Face Transformer (Wolf et al. 2019). The model is fine-tuned with AdamW (Loshchilov and Hutter 2019) optimizer... The paper mentions software tools but does not specify version numbers for them (e.g., 'Hugging Face Transformer' without a version).
Experiment Setup	Yes	The model is fine-tuned with AdamW (Loshchilov and Hutter 2019) optimizer using a learning rate in the range of {1e-5, 2e-5, 5e-5} and batch size in {16, 32}. The learning rate is scheduled by a linear warm-up (Goyal et al. 2017) for the first 6% of steps followed by a linear decay to 0. The maximum number of training epochs is set to 10. For Iso BN, the momentum α is set to 0.95, the ϵ is set to 0.1, and the normalization strength β is chosen in the range of {0.25, 0.5, 1}.