Removing Biases from Molecular Representations via Information Maximization

Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi S. Jaakkola

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on drug screening data reveal Info CORE s superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. We conduct experiments with two common readouts of high-content drug screens: LINCS gene expression profiles (Subramanian et al., 2017) and cell imaging profiles (Bray et al., 2017). We show that Info CORE consistently outperforms the baseline models across a range of downstream tasks, including molecule-phenotype retrieval and molecular property prediction.
Researcher Affiliation Academia Chenyu Wang 1,2, Sharut Gupta1, Caroline Uhler2,3, Tommi Jaakkola1 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology 2Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 3Laboratory for Information and Decision Systems, Massachusetts Institute of Technology
Pseudocode No The paper describes methods and derivations but does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/uhlerlab/Info CORE.
Open Datasets Yes We conduct experiments with two common readouts of high-content drug screens: LINCS gene expression profiles (Subramanian et al., 2017) and cell imaging profiles obtained from the Cell Painting assay (Bray et al., 2017).
Dataset Splits No To mimic this process, similar to the setting in Zheng et al. (2022), we randomly split both datasets into a training set consisting of 80% of the molecules and hold out the remaining molecules for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software like Mol2vec, Cell Profiler, CLIP, CCL, Mo Co, Sim CLR, but does not specify their version numbers, nor the version of Python or other programming languages used.
Experiment Setup Yes For Info CORE, the weighting hyperparameter α is set to be 0.09 and the gradient adjustment hyperparameter λ is set to be 0.1. For Info CORE, we use 2-layer MLPs as the classifiers and set the hyperparameter α to be 0.33 for GE and 0.83 for CP, and λ to be 0 for both datasets. The level of noise is controlled by a hyperparameter αnoise. We set αnoise = 0.5 for GE and αnoise = 0 for CP in our experiments. Mixup weights w from a symmetric Dirichlet distribution with hyperparameter αdir: w Dirichlet(αdir). We set αdir = 0.6 for GE and αdir = 0.8 for CP in our experiments. Dropout proportions αdrop are set to 0.1 for both datasets.