Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

Authors: AndaΓ§ Demir, Baris Coskunuzer, Yulia Gel, Ignacio Segovia-Dominguez, Yuzhou Chen, Bulent Kiziltan

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive numerical experiments in VS, showing that our To DD models outperform all state-of-the-art methods by a wide margin (See Figure 1).
Researcher Affiliation Collaboration Andac Demir Novartis EMAIL Baris Coskunuzer University of Texas at Dallas EMAIL Ignacio Segovia-Dominguez University of Texas at Dallas Jet Propulsion Laboratory, Caltech Yuzhou Chen Temple University Yulia Gel University of Texas at Dallas National Science Foundation Bulent Kiziltan Novartis EMAIL
Pseudocode No The paper describes the process in numbered steps but does not provide a formal pseudocode block or algorithm listing.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to a code repository for their work.
Open Datasets Yes Cleves-Jain: This is a relatively small dataset [26] that has 1149 compounds.* There are 22 different drug targets, and for each one of them the dataset provides only 2-3 template active compounds dedicated for model training, which presents a few-shot learning task. ... *Cleves-Jain dataset: https://www.jainlab.org/Public/SF-Test-Data-Drug Space-2006.zip DUD-E Diverse: DUD-E (Directory of Useful Decoys, Enhanced) dataset [67] is a comprehensive ligand dataset with 102 targets and approximately 1.5 million compounds.* ... *DUD-E Diverse dataset: http://dude.docking.org/subsets/diverse
Dataset Splits Yes The performance of all models was assessed by 5-fold cross-validation (CV).
Hardware Specification Yes Training time of To DD-Vi T and To DD-Conv Ne Xt for each individual drug target takes less than 1 hour on a single GPU (NVIDIA RTX 2080 Ti).
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'Conv Ne Xt_tiny models' but does not specify their version numbers or the versions of underlying libraries like Python or PyTorch.
Experiment Setup Yes Transfer learning via fine-tuning Vi T_b_16 and Conv Ne Xt_tiny models using Adam optimizer with a learning rate of 5e-4, no warmup or layerwise learning rate decay, cosine annealing schedule for 5 epochs, stochastic weight averaging for 5 epochs, weight decay of 1e-4, and a batch size of 64 for 10 epochs in total led to significantly better performance in Enrichment Factor and ROC-AUC scores compared to training from scratch.