reproducibilityindex.ai

Stable Anisotropic Regularization

Authors: William Rudman, Carsten Eickhoff

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose I-STAR: Iso Score -based STable Anisotropic Regularization, a novel regularization method that can increase or decrease levels of isotropy in embedding space during training. I-STAR uses Iso Score , the first accurate measure of isotropy that is both differentiable and stable on mini-batch computations. In contrast to several previous works, we find that decreasing isotropy in contextualized embeddings improves performance on most tasks and models considered in this paper.
Researcher Affiliation	Academia	William Rudman Department of Computer Science Brown University william_rudman@brown.edu Carsten Eickhoff School of Medicine University of Tübingen carsten.eickhoff@uni-tuebingen.de
Pseudocode	Yes	Algorithm 1 Iso Score Forward Pass; Algorithm 2 Iso Score
Open Source Code	Yes	Code: https://github.com/bcbi-edu/p_eickhoff_isoscore.git; Firstly, we have made all code used to produce the project publicly available and attached an anonymous version along with our submission. Further, we have released a pip install of Iso Score to facilitate future works.
Open Datasets	Yes	In this paper, we fine-tune BAppendixepbert, ALBERT (Lan et al., 2020), and Distil BERT (Sanh et al., 2020) for nine NLP common benchmark tasks: SST-2 (Socher et al., 2013), QNLI (Rajpurkar et al., 2016a), RTE (Dagan et al., 2005), MRPC (Dolan and Brockett, 2005), QQP (Wang et al., 2018), COLA (Warstadt et al., 2019), STS-B (Cer et al., 2017), SST-5 Socher et al. (2013) and SQUAD Rajpurkar et al. (2016b).
Dataset Splits	Yes	In this paper, we fine-tune BAppendixepbert, ALBERT (Lan et al., 2020), and Distil BERT (Sanh et al., 2020) for nine NLP common benchmark tasks: SST-2 (Socher et al., 2013), QNLI (Rajpurkar et al., 2016a), RTE (Dagan et al., 2005), MRPC (Dolan and Brockett, 2005), QQP (Wang et al., 2018), COLA (Warstadt et al., 2019), STS-B (Cer et al., 2017), SST-5 Socher et al. (2013) and SQUAD Rajpurkar et al. (2016b). SST-2, QNLI, RTE, MRPC, STS-B, QQP, and MRPC are all datasets in the GLUE benchmark (Wang et al., 2018). These are standard benchmarks that include predefined validation splits.
Hardware Specification	Yes	After we perform our hyperparameter tuning, we fine-tune our models using two 3090-RTX GPUs, use mixed-point precision training for all models/tasks, and set a gradient accumulation step to 2.
Software Dependencies	No	The paper mentions releasing code and using models like BERT, ALBERT, Distil BERT, but it does not specify version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For each model and each task, we hyperparameter tune for batch size (8, 16, 32), training epochs (3,4,5), and learning rate (1e-5, 3e-5, 5e-5). For ISTAR, we tune for the optimal zeta (0.2, 0.4, 0.6, 0.8) and use the tuning parameters values, λ {-5, -3, -1, 1, 3, 5}. For Cos Reg, we use a tuning parameter of 1 in accordance with Gao et al. (2019). All reported performance metrics are calculated as an average over five random seeds to demonstrate the robustness of our results. ...set a gradient accumulation step to 2.