reproducibilityindex.ai

MCAL: Minimum Cost Human-Machine Active Labeling

Authors: Hang Qiu, Krishna Chintalapudi, Ramesh Govindan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and Image Net. In some cases, our approach has 6 lower overall cost relative to human labeling the entire data set, and is always cheaper than the cheapest competing strategy. Evaluations (5) on various popular benchmark data sets show that MCAL achieves lower than the lowest-cost labeling achieved by an oracle active learning strategy.
Researcher Affiliation	Collaboration	Hang Qiu1, Krishna Chintalapudi2, Ramesh Govindan1 1University of Southern California, 2Microsoft Research {hangqiu, ramesh}@usc.edu, krchinta@microsoft.com
Pseudocode	Yes	The MCAL algorithm (Alg. 1, see appendix A) takes as input an active learning metric M(.), the data set X, the classifier D (e.g., RESNET18) and parametric models for training cost (e.g., Eqn. 4), and for error rate as a function of training size (e.g., the truncated power law in Eqn. 3).
Open Source Code	Yes	MCAL is available at https://github.com/hangqiu/MCAL
Open Datasets	Yes	We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and Image Net.
Dataset Splits	No	The paper describes selecting a 'test set T' (e.g., \|T\|=5% of \|X\|) to test and measure performance and estimate errors. However, it does not explicitly define a separate 'validation set' split for purposes like hyperparameter tuning or early stopping during training, distinct from the test set.
Hardware Specification	Yes	Training is performed on virtual machines with 4 NVIDIA K80 GPUs at a cost of 3.6 USD/hr.
Software Dependencies	No	The paper mentions 'Keras (2021)' but does not provide a specific version number for Keras or any other software dependencies crucial for replication.
Experiment Setup	Yes	At each active learning iteration, it trains the model over 200 epochs with a 10 learning rate reduction at 80, 120, 160, 180 epochs, and a mini-batch-size of 256 samples (Keras (2021)).