Training-Free Neural Active Learning with Initialization-Robustness Guarantees
Authors: Apivich Hemachandra, Zhongxiang Dai, Jasraj Singh, See-Kiong Ng, Bryan Kian Hsiang Low
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our EV-GP criterion is highly correlated with both initialization robustness and generalization performance, and show that it consistently outperforms baseline methods in terms of both desiderata, especially in situations with limited initial data or large batch sizes. |
| Researcher Affiliation | Academia | 1Department of Computer Science, National University of Singapore, Republic of Singapore 2School of Computer Science and Engineering, Nanyang Technological University, Republic of Singapore. Correspondence to: Zhongxiang Dai <dzx@nus.edu.sg>. |
| Pseudocode | Yes | Algorithm 1 EV-GP+MS |
| Open Source Code | Yes | The code for the experiments can be found at https://github.com/apivich-h/init-robust-al. |
| Open Datasets | Yes | Meanwhile the real-life training data are taken from the UCI Machine Learning Repository (Dua & Graff). ... MNIST (Deng, 2012). ... EMNIST (Cohen et al., 2017). ... SVHN (Netzer et al., 2011). ... CIFAR100 (Krizhevsky, 2009). |
| Dataset Splits | No | No explicit training/validation/test split percentages or sample counts provided. The paper mentions splitting data into a 'pool' and 'test data' but doesn't detail validation splits or the training percentage of the pool. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory specifications) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions software like JAX, Neural-Tangents, PyTorch, FUNCTOOLS, and Adam optimizer, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | In all of the regression experiments, the model used is a two-layer multi-layer perceptron with width of 512 and with bias. We set σW = 1. and σb = 0.1. The NNs are optimized using gradient descent with step size 0.01. ... For all the models, we train the models using stochastic gradient descent with learning rate of 0.1 and weight decay of 0.005. The models are trained with training batch size of 32 and are trained for 100 epochs. |