Low-Budget Active Learning via Wasserstein Distance: An Integer Programming Approach
Authors: Rafid Mahmood, Sanja Fidler, Marc T Law
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled. |
| Researcher Affiliation | Collaboration | 1NVIDIA 2University of Toronto 3Vector Institute |
| Pseudocode | Yes | In Appendix D.2, we provide a full algorithm description and Python pseudo-code. Finally, note that the techniques proposed here are not exhaustive and more can be used to improve the algorithm. |
| Open Source Code | No | The paper uses and cites external open-source libraries and solvers (e.g., Python Optimal Transport, COIN-OR/CBC Solver) and mentions using the available code of a baseline (WAAL), but it does not provide an explicit statement or link for the open-sourcing of the authors' own implementation code for the methodology described in this paper. |
| Open Datasets | Yes | We evaluate active learning in classification on three data sets: STL-10 (Coates et al., 2011), CIFAR10 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011). Our domain adaptation experiments use the Office-31 data set (Saenko et al., 2010). |
| Dataset Splits | No | For each experiment, we randomly partition the target data set into 70% for training and 30% for testing. (This applies only to the domain adaptation experiments. For image classification, it mentions "default test sets" and general training, but no explicit training/validation split details for reproducibility are given in the main text or appendix.) |
| Hardware Specification | Yes | All experiments are performed on computers with a single NVIDIA V100 GPU card and 50 GB CPU memory. |
| Software Dependencies | No | The paper mentions using the Python Optimal Transport library (Flamary et al., 2021) and the COIN-OR/CBC Solver (Forrest & Lougee-Heimer, 2005), as well as Python-MIP, but it does not specify exact version numbers for these software dependencies, which is required for full reproducibility. |
| Experiment Setup | Yes | We set the solver optimality gap tolerance to 10% and a time limit of 180 seconds to solve W-RMP(Λ) in each iteration. We also warm-start our algorithm with the solution to the k-centers core set... We run Algorithm 1 with ε = 10^-3 and T = 3 hours for STL-10, CIFAR-10, and Office-31, and T = 6 hours for SVHN... for Wass. + EOC + P, we set (β+, β ) = (0.6, 0.99). |