MCAL: Minimum Cost Human-Machine Active Labeling
Authors: Hang Qiu, Krishna Chintalapudi, Ramesh Govindan
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and Image Net. In some cases, our approach has 6 lower overall cost relative to human labeling the entire data set, and is always cheaper than the cheapest competing strategy. Evaluations (5) on various popular benchmark data sets show that MCAL achieves lower than the lowest-cost labeling achieved by an oracle active learning strategy. |
| Researcher Affiliation | Collaboration | Hang Qiu1, Krishna Chintalapudi2, Ramesh Govindan1 1University of Southern California, 2Microsoft Research {hangqiu, ramesh}@usc.edu, krchinta@microsoft.com |
| Pseudocode | Yes | The MCAL algorithm (Alg. 1, see appendix A) takes as input an active learning metric M(.), the data set X, the classifier D (e.g., RESNET18) and parametric models for training cost (e.g., Eqn. 4), and for error rate as a function of training size (e.g., the truncated power law in Eqn. 3). |
| Open Source Code | Yes | MCAL is available at https://github.com/hangqiu/MCAL |
| Open Datasets | Yes | We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and Image Net. |
| Dataset Splits | No | The paper describes selecting a 'test set T' (e.g., |T|=5% of |X|) to test and measure performance and estimate errors. However, it does not explicitly define a separate 'validation set' split for purposes like hyperparameter tuning or early stopping during training, distinct from the test set. |
| Hardware Specification | Yes | Training is performed on virtual machines with 4 NVIDIA K80 GPUs at a cost of 3.6 USD/hr. |
| Software Dependencies | No | The paper mentions 'Keras (2021)' but does not provide a specific version number for Keras or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | At each active learning iteration, it trains the model over 200 epochs with a 10 learning rate reduction at 80, 120, 160, 180 epochs, and a mini-batch-size of 256 samples (Keras (2021)). |