Efficient Nonmyopic Active Search

Authors: Shali Jiang, Gustavo Malkomes, Geoff Converse, Alyssa Shofner, Benjamin Moseley, Roman Garnett

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on diverse datasets from several domains: drug discovery, materials science, and a citation network. Our efficient nonmyopic policy recovers significantly more valuable points with the same budget than several alternatives from the literature, including myopic approximations to the optimal policy.
Researcher Affiliation Academia 1Washington University in St. Louis, St. Louis, MO, USA 2Simpson College, Indianola, IA, USA 3University of South Carolina, Columbia, SC, USA.
Pseudocode No The paper describes its algorithms mathematically and in prose, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions: "We implemented our approximation to the Bayesian optimal policy with the MATLAB active learning toolbox,2" with Footnote 2 pointing to "https://github.com/rmgarnett/active_learning". This link refers to a general toolbox used by the authors, not explicitly the open-source code for their specific proposed ENS policy.
Open Datasets Yes For our first real data experiment, we consider a subset of the Cite Seerx citation network, first described in (Garnett et al., 2012)... We compiled a database of 118 678 known alloys from the materials literature (e.g., (Kawazoe et al., 1997; all)), an extension of the dataset from (Ward et al., 2016)... The dataset comprises 120 activity classes of human biological importance selected from the Binding DB (Liu et al., 2007) database. ... ZINC database (Sterling & Irwin, 2015);
Dataset Splits No The paper describes how an initial training set is formed for active learning (e.g., "We select a single target (i.e., a NIPS paper) uniformly at random to form an initial training set."), but it does not specify traditional train/validation/test splits for a static dataset, as the data is queried sequentially during the active search process.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies No The paper mentions using the "MATLAB active learning toolbox" but does not specify its version number or versions for any other software dependencies.
Experiment Setup Yes The budget is set to t = 500, and we use k = 50 in the k-NN model... We conduct the same experiments described for the Cite Seerx data above and show the results in Table 1... We also report the performance of a baseline where we randomly sample a stratified sample of size 5% of the database... we again randomly select one positive as the initial training set, and sequentially query t = 500 further points. We also report the performance of a baseline where we randomly sample a stratified sample of size 5% of the database... The experiment was repeated 20 times, varying the initial seed target.