Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regularisation for Efficient Softmax Parameter Generation in Low-Resource Text Classifiers

Authors: Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the method on a diverse set of NLP tasks and show that the model decreases in performance when trained on this data without further adjustments. Therefore, we introduce and evaluate two methods for regularising the training process and show that they not only improve performance when used in conjunction with the new training data but also improve average performance when training only on the original data, compared to the baseline.
Researcher Affiliation	Academia	Daniel Grießhaber1 , Johannes Maucher1 , Ngoc Thang Vu2 1Institute for Applied Artificial Intelligence (IAAI) Stuttgart Media University Nobelstraße 10, 70569 Stuttgart 2 Institute for Natural Language Processing (IMS) University of Stuttgart Pfaffenwaldring 5B, 70569 Stuttgart EMAIL, EMAIL
Pseudocode	No	No pseudocode or algorithm blocks were found.
Open Source Code	Yes	The implementation used for all experiments in this work is available for reference online5. 5https://gitlab.mi.hdm-stuttgart.de/griesshaber/metanlp
Open Datasets	Yes	We follow Bansal et al. [2020a] in the selection of a subset of datasets from the GLUE [Wang et al., 2018] meta dataset, specifically the MNLI (matched and mismatched), MRPC, QNLI, QQP, RTE, SST-2 and SNLI datasets [Bowman et al., 2015] for training the meta-model. From the Amazon Review Corpus [Blitzer et al., 2007], we use the Product Categories Books, DVD, Electronics and Kitchen. The Co NLL-2003 shared task [Tjong Kim Sang and De Meulder, 2003] is a named entity recognition task. The Airline dataset1 consists of tweets about north American airlines... The Disaster dataset2 contains tweets... The Emotion dataset3 contains N=13 different emotions... The Political Audience, Political Bias and Political Message4 tasks all use the same input texts...
Dataset Splits	Yes	For each task, a training (support) set Ds i Ti and a validation (query) set Dq i Ti is sampled. A k-shot subset of a dataset is created by choosing k random samples from each of the N classes. We aggregate the mean accuracy for 10 different subsets for each dataset with k {4, 8, 16} and report the average accuracy and the standard deviation between runs.
Hardware Specification	Yes	We performed our experiments on compute nodes with 4x NVIDIA 2080 Ti GPUs where training took 72 hours per experiment with the original dataset and an additional 96 hours for the combined dataset.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were mentioned.
Experiment Setup	Yes	Table 1 shows a matrix of model and dataset configurations used in the experiments. The Dataset column describes which datasets were used while training the model showing whether additionally generated training data was available or not. The Loss column indicates whether cross-entropy (Lce) or the mixed-loss approach described in section 3.2 (Lmλ) is used where the number is the set value for parameter λ in the experiment. A dot in column α indicates that attention was used in the parameter generator to calculate sample weights as described in section 3.2. where λ is a hyper-parameter of the model that needs tuning.