GeneDisco: A Benchmark for Experimental Design in Drug Discovery
Authors: Arash Mehrjou, Ashkan Soleymani, Andrew Jesson, Pascal Notin, Yarin Gal, Stefan Bauer, Patrick Schwab
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we introduce Gene Disco, a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. Gene Disco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration. We perform an extensive experimental baseline evaluation that establishes the relative performance of existing state-of-the-art methods on all the developed benchmark settings using a total of more than 20 000 central processing unit (CPU) hours of compute time. |
| Researcher Affiliation | Collaboration | Arash Mehrjou1, Ashkan Soleymani2, Andrew Jesson3, Pascal Notin3, Yarin Gal3, Stefan Bauer1, Patrick Schwab1 1Glaxo Smith Kline, Artificial Intelligence & Machine Learning 2 MIT, 3 Department of Computer Science, University of Oxford |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual descriptions but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Gene Disco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration. Gene Disco consists of... an accessible open-source code base for evaluating and comparing new batch active learning methods for biological discovery. |
| Open Datasets | Yes | Gene Disco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration. The Gene Disco benchmark curates and standardizes two types of datasets: three standardized feature sets describing interventions t (inputs to counterfactual estimators; Section 4.1), and four different in vitro genome-wide CRISPR experimental assays (predicted counterfactual outcomes; Section 4.2)... The benchmark includes four publicly available datasets, which have previously been published in a peer review process. |
| Dataset Splits | Yes | The size of the hidden layer is determined at each active learning cycle by k-fold cross validation against 20% of the acquired batch. The test data is a random 20% subset of the whole data that is left aside before the active learning process initiates, and is kept fixed across all experimental settings (i.e., for different datasets and different batch sizes) to enable a consistent comparison of the various acquisition functions, counterfactual estimator and treatment descriptor configurations. For all introduced datasets, we include a detailed description and the details on the train, test and validation splits at the beginning of Section 5. |
| Hardware Specification | Yes | We perform an extensive experimental baseline evaluation that establishes the relative performance of existing state-of-the-art methods on all the developed benchmark settings using a total of more than 20 000 central processing unit (CPU) hours of compute time. We additionally provide error bars over multiple random seeds and the code was executed on a cloud cluster with Intel CPUs. |
| Software Dependencies | No | The paper mentions 'the Scikit-learn package (Pedregosa et al., 2011)' but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | Setup. In order to assess current state-of-the-art methods on the Gene Disco benchmark, we perform an extensive baseline evaluation of 9 acquisition functions, 6 acquisition batch sizes and 4 experimental assays using in excess of 20 000 CPU hours of compute time... The employed counterfactual estimator ˆg is a multi-layer perceptron (MLP) that has one hidden layer with Re LU activation and a linear output layer. The size of the hidden layer is determined at each active learning cycle by k-fold cross validation against 20% of the acquired batch. At each cycle, the model is trained for at most 100 epochs but early stopping may interrupt training earlier if the validation error does not decrease. Each experiment is repeated with 5 random seeds to assess experimental variance. To choose the number of active learning cycles, we use the following strategy: the number of cycles are bounded to 40 for the acquisition batches of sizes 16, 32 and 64 due to the computational limits. For larger batch sizes, the number of cycles are reduced proportionally so that the same total number of data points are acquired throughout the cycles. At each cycle, the model is trained from scratch using the data collected up to that cycle, i.e. a trained model is not transferred to the future cycles. |