reproducibilityindex.ai

Towards Inferential Reproducibility of Machine Learning Research

Authors: Michael Hagmann, Philipp Meier, Stefan Riezler

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We exemplify the methods introduced above on an NLP example from the paperswithcode.com open resource, namely the BART+R3F fine-tuning algorithm presented by Aghajanyan et al. (2021) for the task of text summarization, evaluated on the CNN/Daily Mail (Hermann et al., 2015) and Reddit TIFU (Kim et al., 2019) datasets.
Researcher Affiliation	Academia	Michael Hagmann1, Philipp Meier1, Stefan Riezler1,2 Computational Linguistics1 & IWR2 Heidelberg University, Germany {hagmann,meier,riezler}@cl.uni-heidelberg.de
Pseudocode	No	The general form of an LMEM is Y = Xβ + Zb + ϵ, (1) where X is an (N k)-matrix and Z is an (N m)-matrix, called modelor design-matrices, which relate the unobserved vectors β and b to Y. β is a k-vector of fixed effects and b is an m-dimensional random vector called the random effects vector. ϵ is an N-dimensional vector called the error component. The random vectors are assumed to have the following distributions: b N(0, ψθ), ϵ N(0, Λθ), (2)
Open Source Code	Yes	Code (R and Python) for the toolkit and sample applications are publicly available.3
Open Datasets	Yes	We exemplify the methods introduced above on an NLP example from the paperswithcode.com open resource, namely the BART+R3F fine-tuning algorithm presented by Aghajanyan et al. (2021) for the task of text summarization, evaluated on the CNN/Daily Mail (Hermann et al., 2015) and Reddit TIFU (Kim et al., 2019) datasets.
Dataset Splits	No	The paper gives detailed meta-parameter settings for the text summarization experiments, but reports final results as maxima over training runs started from 10 unknown random seeds. Furthermore, the regularization parameter is specified as a choice of λ [0.001, 0.01, 0.1], and the noise type as a choice from [U, N]. Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8. All models were trained for 20-30 epochs using a loss-based stopping criterion.
Hardware Specification	Yes	Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8.
Software Dependencies	No	Code (R and Python) for the toolkit and sample applications are publicly available.
Experiment Setup	Yes	The paper gives detailed meta-parameter settings for the text summarization experiments, but reports final results as maxima over training runs started from 10 unknown random seeds. Furthermore, the regularization parameter is specified as a choice of λ [0.001, 0.01, 0.1], and the noise type as a choice from [U, N]. Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8. All models were trained for 20-30 epochs using a loss-based stopping criterion.