Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
IALE: Imitating Active Learner Ensembles
Authors: Christoffer Löffler, Christopher Mutschler
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that we can (1) train a policy on image datasets such as MNIST, Fashion-MNIST, Kuzushiji-MNIST, Extended MNIST, CIFAR and SVHN, (2) transfer the policy between them, and (3) even transfer the policy between different classifier architectures (see Section 4). |
| Researcher Affiliation | Academia | Christoffer Löffler EMAIL Machine Learning and Data Analytics Lab Friedrich-Alexander University Erlangen-Nürnberg (FAU) Carl-Thiersch-Straße 2b, 91052, Erlangen, Germany Christopher Mutschler EMAIL Fraunhofer IIS Fraunhofer Institute for Integrated Circuits IIS Nuremberg, Germany |
| Pseudocode | Yes | Algorithm 1 Imitating Active Learner Ensembles |
| Open Source Code | Yes | The source code is available at https://github.com/crispchris/IALE and can be used to reproduce our experimental results. |
| Open Datasets | Yes | We use the image classification datasets MNIST (Le Cun et al., 1998), Fashion MNIST (FMNIST) (Xiao et al., 2017), Kuzushiji-MNIST (KMNIST) (Clanuwat et al., 2018), Extended MNIST (Cohen et al., 2017), CIFAR-10/-100 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) for our evaluation. |
| Dataset Splits | Yes | The initial amount of labeled training data is 20 samples (class-balanced). At each step of the active learning process, 10 samples are labeled and added to the training data until a labeling budget B of 1, 000 is reached. We use the AL heuristics MC-Dropout, Ensemble, Core Set, BADGE, Confidence and Entropy as experts, and use Dval with 100 labeled samples to score the acquisitions of the experts. ... We use 2,000 initial labels, an acq-size of 10 and B = 10,000 for CIFAR-10 and 1,000 initial labels, an acq-size of 1,000 and B = 16,000 for SVHN. |
| Hardware Specification | Yes | 6 minutes versus 215 minutes per epoch on one NVidia Tesla V100 GPU |
| Software Dependencies | No | The paper mentions "Adam optimizer (Kingma and Ba, 2015)" but does not specify any software names with version numbers. |
| Experiment Setup | Yes | We use the MNIST dataset as our source dataset on which we train our policy for 100 episodes, with each episode containing data from an AL cycle. The initial amount of labeled training data is 20 samples (class-balanced). At each step of the active learning process, 10 samples are labeled and added to the training data until a labeling budget B of 1, 000 is reached. We use the AL heuristics MC-Dropout, Ensemble, Core Set, BADGE, Confidence and Entropy as experts, and use Dval with 100 labeled samples to score the acquisitions of the experts. The pool dataset is sampled with n = 100 at each AL iteration. We choose p = 0.5 for means of comparison with the baselines (based on preliminary experiments, see Appendix A.2.1 on Exploration-Exploitation). We train the policy s MLP on the growing list of state and action pairs using the binary cross entropy loss from Equation 2 and use the Adam optimizer (Kingma and Ba, 2015) for 30 epochs with a learning rate of 10-3, β1 = 0.9, β2 = 0.999, ϵ=10-8, without any weight decay. |