Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IALE: Imitating Active Learner Ensembles

Authors: Christoffer Löffler, Christopher Mutschler

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that we can (1) train a policy on image datasets such as MNIST, Fashion-MNIST, Kuzushiji-MNIST, Extended MNIST, CIFAR and SVHN, (2) transfer the policy between them, and (3) even transfer the policy between different classiﬁer architectures (see Section 4).
Researcher Affiliation	Academia	Christoffer Löffler EMAIL Machine Learning and Data Analytics Lab Friedrich-Alexander University Erlangen-Nürnberg (FAU) Carl-Thiersch-Straße 2b, 91052, Erlangen, Germany Christopher Mutschler EMAIL Fraunhofer IIS Fraunhofer Institute for Integrated Circuits IIS Nuremberg, Germany
Pseudocode	Yes	Algorithm 1 Imitating Active Learner Ensembles
Open Source Code	Yes	The source code is available at https://github.com/crispchris/IALE and can be used to reproduce our experimental results.
Open Datasets	Yes	We use the image classiﬁcation datasets MNIST (Le Cun et al., 1998), Fashion MNIST (FMNIST) (Xiao et al., 2017), Kuzushiji-MNIST (KMNIST) (Clanuwat et al., 2018), Extended MNIST (Cohen et al., 2017), CIFAR-10/-100 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) for our evaluation.
Dataset Splits	Yes	The initial amount of labeled training data is 20 samples (class-balanced). At each step of the active learning process, 10 samples are labeled and added to the training data until a labeling budget B of 1, 000 is reached. We use the AL heuristics MC-Dropout, Ensemble, Core Set, BADGE, Conﬁdence and Entropy as experts, and use Dval with 100 labeled samples to score the acquisitions of the experts. ... We use 2,000 initial labels, an acq-size of 10 and B = 10,000 for CIFAR-10 and 1,000 initial labels, an acq-size of 1,000 and B = 16,000 for SVHN.
Hardware Specification	Yes	6 minutes versus 215 minutes per epoch on one NVidia Tesla V100 GPU
Software Dependencies	No	The paper mentions "Adam optimizer (Kingma and Ba, 2015)" but does not specify any software names with version numbers.
Experiment Setup	Yes	We use the MNIST dataset as our source dataset on which we train our policy for 100 episodes, with each episode containing data from an AL cycle. The initial amount of labeled training data is 20 samples (class-balanced). At each step of the active learning process, 10 samples are labeled and added to the training data until a labeling budget B of 1, 000 is reached. We use the AL heuristics MC-Dropout, Ensemble, Core Set, BADGE, Conﬁdence and Entropy as experts, and use Dval with 100 labeled samples to score the acquisitions of the experts. The pool dataset is sampled with n = 100 at each AL iteration. We choose p = 0.5 for means of comparison with the baselines (based on preliminary experiments, see Appendix A.2.1 on Exploration-Exploitation). We train the policy s MLP on the growing list of state and action pairs using the binary cross entropy loss from Equation 2 and use the Adam optimizer (Kingma and Ba, 2015) for 30 epochs with a learning rate of 10-3, β1 = 0.9, β2 = 0.999, ϵ=10-8, without any weight decay.