Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SEL-BALD: Deep Bayesian Active Learning with Selective Labels

Authors: Ruijiang Gao, Mingzhang Yin, Maytal Saar-Tsechansky

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on both synthetic and real-world datasets to demonstrate the effectiveness of our proposed algorithms.
Researcher Affiliation Academia Ruijiang Gao Naveen Jindal School of Management University of Texas at Dallas Richardson, TX 75082 EMAIL Mingzhang Yin Warrington College of Business University of Florida Gainesville, FL 32611 EMAIL Maytal Saar-Tsechansky Information, Risk, and Operations Management University of Texas at Austin Austin, TX 78712 EMAIL
Pseudocode Yes Algorithm 1 Bayesian Active Learning for Selective Labeling with Instance Rejection (SEL-BALD)
Open Source Code Yes The code is available at https://github.com/ruijiang81/SEL-BALD.
Open Datasets Yes We conduct experiments on both synthetic and real-world datasets. ... For our case study, we use the Give-Me-some-Credit (GMC) dataset [Credit Fusion, 2011]. ... We also examine each active learning method on a high-dimensional dataset MNIST [Le Cun, 1998]. ... More Real-World Datasets: We compare the proposed methods with baselines on Fashion MNIST [Xiao et al., 2017], CIFAR-10 [Krizhevsky, 2009], Adult [Becker and Kohavi, 1996] and Mushroom [mus, 1981] datasets in Appendix E.
Dataset Splits No The paper mentions 'training set' and 'test set' sizes for the synthetic, GMC, and MNIST datasets (e.g., '3700 samples as the training set and around 1600 samples as the test set' for synthetic data), but it does not explicitly specify a separate 'validation' split with percentages or sample counts for hyperparameter tuning.
Hardware Specification Yes We run the experiments on a server with 3 Nvidia A100 graphics cards and AMD EPYC 7763 64-Core Processor.
Software Dependencies No The paper mentions software components like 'Bayesian neural network', 'MC-dropout', and 'Adam optimizer', but it does not provide specific version numbers for any of these or for broader frameworks (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes For the predictive model and human discretion model class, we use a Bayesian neural network and use MC-dropout [Gal et al., 2017] to approximate the posterior (we set the number of MC samples as 40 in all experiments). The model architecture is a 3-layer fully connected neural network with Leaky ReLU activation function. We use the Adam optimizer with a learning rate of 0.01. ... We set β = 0.75 for Joint-BALD-UCB in all experiments. ... The results are averaged over 3 runs with a query size of 10, 50 randomly examined instances initially, and a budget of 450. (GMC) ... The results are averaged over 3 runs with a query size of 20, 100 randomly examined instances initially, and a budget of 1000. (MNIST)