reproducibilityindex.ai

Efficient nonmyopic batch active search

Authors: Shali Jiang, Gustavo Malkomes, Matthew Abbott, Benjamin Moseley, Roman Garnett

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct thorough experiments on data from three application domains: a citation network, material science, and drug discovery, testing all proposed policies with a wide range of batch sizes. Our results demonstrate that the empirical performance gap matches our theoretical bound, that nonmyopic policies usually signiﬁcantly outperform myopic alternatives, and that diversity is an important consideration for batch policy design.
Researcher Affiliation	Collaboration	Shali Jiang CSE, WUSTL St. Louis, MO 63130 jiang.s@wustl.edu Gustavo Malkomes CSE, WUSTL St. Louis, MO 63130 luizgustavo@wustl.edu Matthew Abbott CSE, WUSTL St. Louis, MO 63130 mbabbott@wustl.edu Benjamin Moseley Tepper School of Business, CMU and Relational AI Pittsburgh, PA 15213 moseleyb@andrew.cmu.edu Roman Garnett CSE, WUSTL St. Louis, MO 63130 garnett@wustl.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We implement all these policies with the MATLAB active learning toolbox.1 https://github.com/rmgarnett/active_learning
Open Datasets	Yes	We consider the ﬁrst ten of the 120 datasets used in [7, 12] and only the ECFP4 ﬁngerprint, which showed the best performance in those studies. These datasets share a pool of 100 000 negative compounds randomly selected from the ZINC database [20].
Dataset Splits	No	The paper describes its experimental setup in terms of budget and repetitions, but it does not specify explicit training, validation, and test dataset splits in the conventional machine learning sense.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using 'the MATLAB active learning toolbox' but does not specify version numbers for MATLAB or the toolbox, which are required for reproducible software dependencies.
Experiment Setup	Yes	We use k nearest neighbor (k-nn) with k = 100 as our probability model for the drug discovery datasets, and k = 50 for the other two datasets (following the studies in [7, 12]). For each dataset, we start with one random initial positive seed observation and repeat the experiment 20 times. ... The budget is set as T = 500. We test batch-ENS with 16 and 32 samples, coded as batch-ENS-16 and batch-ENS-32.