Towards Inferential Reproducibility of Machine Learning Research
Authors: Michael Hagmann, Philipp Meier, Stefan Riezler
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We exemplify the methods introduced above on an NLP example from the paperswithcode.com open resource, namely the BART+R3F fine-tuning algorithm presented by Aghajanyan et al. (2021) for the task of text summarization, evaluated on the CNN/Daily Mail (Hermann et al., 2015) and Reddit TIFU (Kim et al., 2019) datasets. |
| Researcher Affiliation | Academia | Michael Hagmann1, Philipp Meier1, Stefan Riezler1,2 Computational Linguistics1 & IWR2 Heidelberg University, Germany {hagmann,meier,riezler}@cl.uni-heidelberg.de |
| Pseudocode | No | The general form of an LMEM is Y = Xβ + Zb + ϵ, (1) where X is an (N k)-matrix and Z is an (N m)-matrix, called modelor design-matrices, which relate the unobserved vectors β and b to Y. β is a k-vector of fixed effects and b is an m-dimensional random vector called the random effects vector. ϵ is an N-dimensional vector called the error component. The random vectors are assumed to have the following distributions: b N(0, ψθ), ϵ N(0, Λθ), (2) |
| Open Source Code | Yes | Code (R and Python) for the toolkit and sample applications are publicly available.3 |
| Open Datasets | Yes | We exemplify the methods introduced above on an NLP example from the paperswithcode.com open resource, namely the BART+R3F fine-tuning algorithm presented by Aghajanyan et al. (2021) for the task of text summarization, evaluated on the CNN/Daily Mail (Hermann et al., 2015) and Reddit TIFU (Kim et al., 2019) datasets. |
| Dataset Splits | No | The paper gives detailed meta-parameter settings for the text summarization experiments, but reports final results as maxima over training runs started from 10 unknown random seeds. Furthermore, the regularization parameter is specified as a choice of λ [0.001, 0.01, 0.1], and the noise type as a choice from [U, N]. Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8. All models were trained for 20-30 epochs using a loss-based stopping criterion. |
| Hardware Specification | Yes | Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8. |
| Software Dependencies | No | Code (R and Python) for the toolkit and sample applications are publicly available. |
| Experiment Setup | Yes | The paper gives detailed meta-parameter settings for the text summarization experiments, but reports final results as maxima over training runs started from 10 unknown random seeds. Furthermore, the regularization parameter is specified as a choice of λ [0.001, 0.01, 0.1], and the noise type as a choice from [U, N]. Using the given settings, we started the BART+R3F code from 5 new random seeds and the BART-large baseline from 18 random seeds on 4 Nvidia Tesla V100 GPUs each with 32 GB RAM and a update frequency of 8. All models were trained for 20-30 epochs using a loss-based stopping criterion. |