reproducibilityindex.ai

Better Fine-Tuning by Reducing Representational Collapse

Authors: Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our ﬁne-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including Daily Mail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We will ﬁrst measure performance by ﬁne-tuning on a range of tasks and languages. The next sections report why methods rooted in trust region, including ours, outperform standard ﬁne-tuning.
Researcher Affiliation	Industry	Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta & Naman Goyal Facebook {armenag,akshats,anchit,naman}@fb.com Luke Zettlemoyer & Sonal Gupta Facebook {lsz, sonalgupta}@fb.com
Pseudocode	No	The paper describes its methods using mathematical equations and textual descriptions but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We will ﬁrst test R3F and R4F on sentence classiﬁcation tasks from the GLUE benchmark (Wang et al., 2018). We select the same subset of GLUE tasks that have been reported by prior work in this space (Jiang et al., 2019): MNLI (Williams et al., 2018), QQP (Iyer et al., 2017), RTE (Bentivogli et al., 2009), QNLI (Rajpurkar et al., 2016), MRPC (Dolan & Brockett, 2005), Co LA (Warstadt et al., 2018), SST-2 (Socher et al., 2013). We take a look at the popular XNLI benchmark, containing 15 languages (Conneau et al., 2018). abstractive summarization, due to its additional complexity and computational cost, speciﬁcally we look at three datasets: CNN/Dailymail (Hermann et al., 2015), Gigaword (Napoles et al., 2012) and Reddit TIFU (Kim et al., 2018).
Dataset Splits	Yes	We report the performance of all models on the GLUE development set. We present our best results on the GLUE development set for various ﬁne-tuning methods applied to the Ro BERTa Large model.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU or CPU models, or details about the computing environment used for experiments.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For our GLUE related experiments, both full ﬁne-tuning and probing, the following parameters are used. Table 5: Task speciﬁc hyper parameters for GLUE experiments (Learning Rate, Max Updates, Max Sentences). Table 6: Hyper parameters for R3F and R4F experiments on GLUE (Optimizer, LR Scheduler, Dropout, Weight Decay, Warmup Updates, λ, Noise Types, σ).