reproducibilityindex.ai

DiscoBAX: Discovery of optimal intervention sets in genomic experiment design

Authors: Clare Lyle, Arash Mehrjou, Pascal Notin, Andrew Jesson, Stefan Bauer, Yarin Gal, Patrick Schwab

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. Disco BAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems.
Researcher Affiliation	Collaboration	1University of Oxford 2Google Deep Mind 3Glaxo Smith Kline 4Helmholtz AI 5Technical University of Munich
Pseudocode	Yes	Algorithm 1 Subset Select Algorithm 2 Disco BAX
Open Source Code	Yes	The implementation of Disco BAX and the code to reproduce the experimental results are publicly available in https://github.com/amehrjou/ Disco BAX.
Open Datasets	Yes	The Gene Disco benchmark (Mehrjou et al., 2021) is comprised of five large-scale genome-wide CRISPR assays and compares the relative strengths of nine active learning algorithms... All experiments we carried out in 5.2 leverage the Achilles dataset (Dempster et al., 2019) from Gene Disco to represent the different interventions.
Dataset Splits	No	The paper describes experimental setup details like '25 consecutive batch acquisition cycles (with batch size 32)' and 'experiments are repeated 10 times with different random seeds', and hyperparameter selection based on one assay to 'mitigate the risk of overfitting'. However, it does not provide explicit numerical splits (e.g., percentages or counts) for training, validation, and test datasets in the conventional sense of data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the computational experiments, such as specific GPU or CPU models, memory specifications, or cloud computing instances with their configurations.
Software Dependencies	No	The paper mentions types of models and methods used, such as 'Bayesian Neural Networks', 'Gaussian Processes', and 'Monte Carlo dropout (MCD)', but it does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	For all methods and datasets, we perform 25 consecutive batch acquisition cycles (with batch size 32). All experiments are repeated 10 times with different random seeds... We find that on that dataset, optimal values for the hyperparameters are respectively k=5, Levelset=1.0 and S=10.