CoDrug: Conformal Drug Property Prediction with Density Estimation under Covariate Shift
Authors: Siddhartha Laghuvarapu, Zhen Lin, Jimeng Sun
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments involving realistic distribution drifts in various small-molecule drug discovery tasks, we demonstrate the ability of Co Drug to provide valid prediction sets and its utility in addressing the distribution shift arising from de novo drug design models. On average, using Co Drug can reduce the coverage gap by over 35% when compared to conformal prediction sets not adjusted for covariate shift. |
| Researcher Affiliation | Academia | Siddhartha Laghuvarapu Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801 sl160@illinois.edu Zhen Lin Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801 zhenlin4@illinois.edu Jimeng Sun Department of Computer Science Carle Illinois College of Medicine University of Illinois Urbana-Champaign Urbana, IL 61801 jimeng@illinois.edu |
| Pseudocode | Yes | Algorithm 1 Procedure for Property Prediction Training: |
| Open Source Code | Yes | The code associated with the paper is available at https://github.com/siddharthal/Co Drug/ |
| Open Datasets | Yes | Datasets: We use four binary classification datasets for toxicity prediction (AMES, Tox21, Clin Tox) and activity prediction (HIV activity), obtained from TDC [28]. |
| Dataset Splits | Yes | Splitting Ratio: The datasets are split in the ratio of 70:15:15, for training, calibration and testing the CP model. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch, PyTorch Lightning, ADAM, DGL-Life Sci, RDKit, and Deep Chem, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Training hyperparameters: We train the model using the Py Torch Lightning Framework for training. We use the ADAM Optimizer [35]. The batch size is set to 64, and the learning rate is set to 0.001. Architecture Details: The model architecture consists of a GNN layer (Attentive FP [1]), , a readout layer, 2 hidden FCNN layers, and an output layer. The hidden state size in GNN is set to 512 dimensions. The linear layers have 256, and 8 dimensions respectively. Energy Regularization hyperparameters : The parameters min and mout in Eq. (15) are set to -5 and -35 respectively, and the parameter λ in Eq. (16) is set to 0.01. |