Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes

Authors: Erica Zhang, Fangzhao Zhang, Mert Pilanci

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We exemplify the effectiveness of our proposed active learning method against popular deep active learning baselines via both synthetic data experiments and sentimental classification task on real datasets. We validate our proposed training and active learning methods through extensive experiments, comparing them with various popular baselines from scikit-activeml (Kottke et al., 2021) and Deep AL (Huang, 2021). Synthetic data experiments are presented in Section 7.1, and real data experiments are presented in Section 7.2.
Researcher Affiliation	Academia	1Department of Management Science and Engineering, Stanford University 2Department of Electrical Engineering, Stanford University. Correspondence to: Erica Zhang <EMAIL>, Fangzhao Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Training NN with Cutting-Plane Method Algorithm 2 Cutting-plane AL for Binary Classification with Query Synthesis Algorithm 3 Generic (Linear) Cutting-Plane AL Algorithm 4 Cutting-plane AL for Binary Classification with Limited Queries Algorithm 5 Cutting-plane AL for Binary Classification with Inexact Cutting Algorithm 6 Cutting-plane AL for Regression with Limited Queries Algorithm 7 Linear Cutting-Plane AL for Regression Algorithm 8 Cutting-plane AL for Binary Classification with Limited Queries using Final Solve Algorithm 9 Deep Active Learning Baseline
Open Source Code	Yes	The implementation of our cutting-plane active learning (CPAL) method is made available at https://github.com/pilancilab/cpal.
Open Datasets	Yes	To demonstrate the real-world applicability of our method, we combine the Microsoft Phi-2 model (Javaheripi et al., 2023) with our two-layer Re LU binary classifier for sentimental classification on the IMDB (Maas et al., 2011) movie review dataset.
Dataset Splits	Yes	Binary Classification on Synthetic Spiral. We use a synthetic dataset of two intertwined spirals with positive and negative labels (details in Appendix H.4), which contains 100 data points with a 4:1 train-test split. Quadratic Regression. Using 100 noise-free data points from y = x2, we perform a 4:1 train-test split with a query budget of 20 points. For IMDB experiments, we randomly pick 50 training data points and 20 test data points for ease of computation
Hardware Specification	No	The paper mentions "our implementation relies on CPU-based convex program solver" but does not specify any particular CPU models, memory, or other hardware details.
Software Dependencies	Yes	In the training of Algorithm 4, we implement the center function for analytic center retrieval due to its simple computation formula (see Definition 6.1). Since the center retrieval problem is of convex minimization form, we solve it with CVXPY (Diamond & Boyd, 2016) and default to MOSEK (MOSEK Ap S, 2024) as our solver. In our implementation, we use CVXPY and default to solver CLARABEL (Goulart & Chen, 2024). We also survey popular AL algorithms from the scikit-activeml library (Kottke et al., 2021) and the Deep AL package (Huang, 2021). We use the Python package skorch, a scikit-learn compatible wrapper for Py Torch models (Viehmann et al., 2019).
Experiment Setup	Yes	For all deep active learning baselines used in the Spiral case, we have set the hyper-parameters to be according to Table 2. We empirically selected learning rate 0.001 from the choices {0.1, 0.01, 0.001} as it tends to give the best result among the three for all baselines in both the classification and regression tasks. We also choose the number of epochs to be 2000 as it also gives the best result among choices {20, 200, 2000} and show significant improvement from both 20 and 200. Table 2. Hyper-parameters of deep AL baselines training networks with the Stochastic Gradient Descent (SGD) optimizer Epochs 2000 Learning Rate 0.001 Train Batch Size 16 Test Batch Size 10 Momentum 0.9 Weight Decay 0.003