Stable Anisotropic Regularization
Authors: William Rudman, Carsten Eickhoff
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose I-STAR: Iso Score -based STable Anisotropic Regularization, a novel regularization method that can increase or decrease levels of isotropy in embedding space during training. I-STAR uses Iso Score , the first accurate measure of isotropy that is both differentiable and stable on mini-batch computations. In contrast to several previous works, we find that decreasing isotropy in contextualized embeddings improves performance on most tasks and models considered in this paper. |
| Researcher Affiliation | Academia | William Rudman Department of Computer Science Brown University william_rudman@brown.edu Carsten Eickhoff School of Medicine University of Tübingen carsten.eickhoff@uni-tuebingen.de |
| Pseudocode | Yes | Algorithm 1 Iso Score Forward Pass; Algorithm 2 Iso Score |
| Open Source Code | Yes | Code: https://github.com/bcbi-edu/p_eickhoff_isoscore.git; Firstly, we have made all code used to produce the project publicly available and attached an anonymous version along with our submission. Further, we have released a pip install of Iso Score to facilitate future works. |
| Open Datasets | Yes | In this paper, we fine-tune BAppendixepbert, ALBERT (Lan et al., 2020), and Distil BERT (Sanh et al., 2020) for nine NLP common benchmark tasks: SST-2 (Socher et al., 2013), QNLI (Rajpurkar et al., 2016a), RTE (Dagan et al., 2005), MRPC (Dolan and Brockett, 2005), QQP (Wang et al., 2018), COLA (Warstadt et al., 2019), STS-B (Cer et al., 2017), SST-5 Socher et al. (2013) and SQUAD Rajpurkar et al. (2016b). |
| Dataset Splits | Yes | In this paper, we fine-tune BAppendixepbert, ALBERT (Lan et al., 2020), and Distil BERT (Sanh et al., 2020) for nine NLP common benchmark tasks: SST-2 (Socher et al., 2013), QNLI (Rajpurkar et al., 2016a), RTE (Dagan et al., 2005), MRPC (Dolan and Brockett, 2005), QQP (Wang et al., 2018), COLA (Warstadt et al., 2019), STS-B (Cer et al., 2017), SST-5 Socher et al. (2013) and SQUAD Rajpurkar et al. (2016b). SST-2, QNLI, RTE, MRPC, STS-B, QQP, and MRPC are all datasets in the GLUE benchmark (Wang et al., 2018). These are standard benchmarks that include predefined validation splits. |
| Hardware Specification | Yes | After we perform our hyperparameter tuning, we fine-tune our models using two 3090-RTX GPUs, use mixed-point precision training for all models/tasks, and set a gradient accumulation step to 2. |
| Software Dependencies | No | The paper mentions releasing code and using models like BERT, ALBERT, Distil BERT, but it does not specify version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For each model and each task, we hyperparameter tune for batch size (8, 16, 32), training epochs (3,4,5), and learning rate (1e-5, 3e-5, 5e-5). For ISTAR, we tune for the optimal zeta (0.2, 0.4, 0.6, 0.8) and use the tuning parameters values, λ {-5, -3, -1, 1, 3, 5}. For Cos Reg, we use a tuning parameter of 1 in accordance with Gao et al. (2019). All reported performance metrics are calculated as an average over five random seeds to demonstrate the robustness of our results. ...set a gradient accumulation step to 2. |