Actively Testing Your Model While It Learns: Realizing Label-Efficient Learning in Practice

Authors: Dayou Yu, Weishi Shi, Qi Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on both synthetic and real-world datasets to show the improved testing performance of the proposed ATL framework.
Researcher Affiliation Academia Rochester Institute of Technology, Rochester, NY 146231 University of North Texas, Denton, TX 762032
Pseudocode Yes Algorithm 1: Active Testing While Learning (ATL)
Open Source Code Yes The data and source code for replicating the results are provided in this link: https: //github.com/ritmininglab/ATL.git
Open Datasets Yes The neural network model is applied to relatively larger scale image datasets, including MNIST, Fashion MNIST and CIFAR10 to demonstrate the practical performance of the proposed framework.
Dataset Splits Yes For the real-world experiments, we use the cross-entropy loss for risk evaluation. When we compare with the true risk R, we use the average evaluation results of the model on a large hold-out subset of the dataset (10,000 data samples) to represent R. The hold-out test set does not interact with the AL or AT processes, thus is considered a fair evaluation. In real-world experiments, we adopt the same procedure with the total pool containing 30,000 data samples. ... The initial training set contains 500 labels, while each of the 20 AL rounds adds 500 labels.
Hardware Specification Yes All experiments were run on clusters with either NVIDIA A6000 or NVIDIA A100 graphic cards and Intel Xeon Gold 6150 CPU processors.
Software Dependencies No The paper mentions using Python and various machine learning models (Gaussian Processes, neural networks) but does not specify version numbers for any software libraries or dependencies, such as PyTorch or TensorFlow.
Experiment Setup Yes In all experiments, we use a CNN model and standard data transformation for each dataset. In each AL training round, we run 10 epochs for MNIST and Fashion MNIST and 50 epochs for CIFAR10. A threshold of 1 10 5 is used for probability outputs as required for the proposal q(x) computation [14] to avoid 0 denominators.