Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentiable Model Selection for Ensemble Learning

Authors: James Kotary, Vincenzo Di Vito, Ferdinando Fioretto

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The e2e-CEL training is evaluated on several vision classiﬁcation tasks: digit classiﬁcation on MNIST dataset [Deng, 2012], age-range estimation on UTKFace dataset [Zhifei Zhang, 2017], image classiﬁcation on CIFAR10 dataset [Krizhevsky et al., 2009], and emotion detection on FER2013 dataset [Liu et al., 2016]. Table 2 reports the best accuracy over all the ensemble sizes k of ensembles trained by e2e-CEL along with that of each baseline ensemble model, where each are formed using the same pre-trained base learners.
Researcher Affiliation	Academia	James Kotary1 , Vincenzo Di Vito1 and Ferdinando Fioretto1 1 University of Virginia EMAIL, ﬁoretto@virginia.edu
Pseudocode	Yes	Algorithm 1 summarizes the e2e-CEL procedure for training a selection net. Algorithm 1: Training the Selection Net
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets	Yes	digit classiﬁcation on MNIST dataset [Deng, 2012], age-range estimation on UTKFace dataset [Zhifei Zhang, 2017], image classiﬁcation on CIFAR10 dataset [Krizhevsky et al., 2009], and emotion detection on FER2013 dataset [Liu et al., 2016].
Dataset Splits	Yes	In each dataset there is an implied train/test/validation split, so that evaluation of a trained model is always performed on its test portion. Where this distinction is needed, the symbols Xtrain, Xvalid, Xtest are used.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only stating that the selection net uses the 'same CNN architecture as that of the base learner models'.
Software Dependencies	No	The paper refers to 'standard automatic differentiation employed in machine learning libraries [Paszke et al., 2019]' (which cites PyTorch), but no specific version numbers for any software dependencies or libraries are provided.
Experiment Setup	No	The paper describes the general approach to training base learners (e.g., specializing on classes) and the selection net's architecture, and mentions `alpha` in Algorithm 1, but it does not provide specific numerical hyperparameter values such as learning rate, batch size, or number of epochs for the main experimental setup.