reproducibilityindex.ai

Efficient Nonmyopic Active Search

Authors: Shali Jiang, Gustavo Malkomes, Geoff Converse, Alyssa Shofner, Benjamin Moseley, Roman Garnett

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on diverse datasets from several domains: drug discovery, materials science, and a citation network. Our efﬁcient nonmyopic policy recovers signiﬁcantly more valuable points with the same budget than several alternatives from the literature, including myopic approximations to the optimal policy.
Researcher Affiliation	Academia	1Washington University in St. Louis, St. Louis, MO, USA 2Simpson College, Indianola, IA, USA 3University of South Carolina, Columbia, SC, USA.
Pseudocode	No	The paper describes its algorithms mathematically and in prose, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions: "We implemented our approximation to the Bayesian optimal policy with the MATLAB active learning toolbox,2" with Footnote 2 pointing to "https://github.com/rmgarnett/active_learning". This link refers to a general toolbox used by the authors, not explicitly the open-source code for their specific proposed ENS policy.
Open Datasets	Yes	For our ﬁrst real data experiment, we consider a subset of the Cite Seerx citation network, ﬁrst described in (Garnett et al., 2012)... We compiled a database of 118 678 known alloys from the materials literature (e.g., (Kawazoe et al., 1997; all)), an extension of the dataset from (Ward et al., 2016)... The dataset comprises 120 activity classes of human biological importance selected from the Binding DB (Liu et al., 2007) database. ... ZINC database (Sterling & Irwin, 2015);
Dataset Splits	No	The paper describes how an initial training set is formed for active learning (e.g., "We select a single target (i.e., a NIPS paper) uniformly at random to form an initial training set."), but it does not specify traditional train/validation/test splits for a static dataset, as the data is queried sequentially during the active search process.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions using the "MATLAB active learning toolbox" but does not specify its version number or versions for any other software dependencies.
Experiment Setup	Yes	The budget is set to t = 500, and we use k = 50 in the k-NN model... We conduct the same experiments described for the Cite Seerx data above and show the results in Table 1... We also report the performance of a baseline where we randomly sample a stratiﬁed sample of size 5% of the database... we again randomly select one positive as the initial training set, and sequentially query t = 500 further points. We also report the performance of a baseline where we randomly sample a stratiﬁed sample of size 5% of the database... The experiment was repeated 20 times, varying the initial seed target.