DiscoBAX: Discovery of optimal intervention sets in genomic experiment design
Authors: Clare Lyle, Arash Mehrjou, Pascal Notin, Andrew Jesson, Stefan Bauer, Yarin Gal, Patrick Schwab
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. Disco BAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems. |
| Researcher Affiliation | Collaboration | 1University of Oxford 2Google Deep Mind 3Glaxo Smith Kline 4Helmholtz AI 5Technical University of Munich |
| Pseudocode | Yes | Algorithm 1 Subset Select Algorithm 2 Disco BAX |
| Open Source Code | Yes | The implementation of Disco BAX and the code to reproduce the experimental results are publicly available in https://github.com/amehrjou/ Disco BAX. |
| Open Datasets | Yes | The Gene Disco benchmark (Mehrjou et al., 2021) is comprised of five large-scale genome-wide CRISPR assays and compares the relative strengths of nine active learning algorithms... All experiments we carried out in 5.2 leverage the Achilles dataset (Dempster et al., 2019) from Gene Disco to represent the different interventions. |
| Dataset Splits | No | The paper describes experimental setup details like '25 consecutive batch acquisition cycles (with batch size 32)' and 'experiments are repeated 10 times with different random seeds', and hyperparameter selection based on one assay to 'mitigate the risk of overfitting'. However, it does not provide explicit numerical splits (e.g., percentages or counts) for training, validation, and test datasets in the conventional sense of data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the computational experiments, such as specific GPU or CPU models, memory specifications, or cloud computing instances with their configurations. |
| Software Dependencies | No | The paper mentions types of models and methods used, such as 'Bayesian Neural Networks', 'Gaussian Processes', and 'Monte Carlo dropout (MCD)', but it does not specify version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For all methods and datasets, we perform 25 consecutive batch acquisition cycles (with batch size 32). All experiments are repeated 10 times with different random seeds... We find that on that dataset, optimal values for the hyperparameters are respectively k=5, Levelset=1.0 and S=10. |