Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
Authors: AndaΓ§ Demir, Baris Coskunuzer, Yulia Gel, Ignacio Segovia-Dominguez, Yuzhou Chen, Bulent Kiziltan
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive numerical experiments in VS, showing that our To DD models outperform all state-of-the-art methods by a wide margin (See Figure 1). |
| Researcher Affiliation | Collaboration | Andac Demir Novartis EMAIL Baris Coskunuzer University of Texas at Dallas EMAIL Ignacio Segovia-Dominguez University of Texas at Dallas Jet Propulsion Laboratory, Caltech Yuzhou Chen Temple University Yulia Gel University of Texas at Dallas National Science Foundation Bulent Kiziltan Novartis EMAIL |
| Pseudocode | No | The paper describes the process in numbered steps but does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the described methodology or a direct link to a code repository for their work. |
| Open Datasets | Yes | Cleves-Jain: This is a relatively small dataset [26] that has 1149 compounds.* There are 22 different drug targets, and for each one of them the dataset provides only 2-3 template active compounds dedicated for model training, which presents a few-shot learning task. ... *Cleves-Jain dataset: https://www.jainlab.org/Public/SF-Test-Data-Drug Space-2006.zip DUD-E Diverse: DUD-E (Directory of Useful Decoys, Enhanced) dataset [67] is a comprehensive ligand dataset with 102 targets and approximately 1.5 million compounds.* ... *DUD-E Diverse dataset: http://dude.docking.org/subsets/diverse |
| Dataset Splits | Yes | The performance of all models was assessed by 5-fold cross-validation (CV). |
| Hardware Specification | Yes | Training time of To DD-Vi T and To DD-Conv Ne Xt for each individual drug target takes less than 1 hour on a single GPU (NVIDIA RTX 2080 Ti). |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Conv Ne Xt_tiny models' but does not specify their version numbers or the versions of underlying libraries like Python or PyTorch. |
| Experiment Setup | Yes | Transfer learning via fine-tuning Vi T_b_16 and Conv Ne Xt_tiny models using Adam optimizer with a learning rate of 5e-4, no warmup or layerwise learning rate decay, cosine annealing schedule for 5 epochs, stochastic weight averaging for 5 epochs, weight decay of 1e-4, and a batch size of 64 for 10 epochs in total led to significantly better performance in Enrichment Factor and ROC-AUC scores compared to training from scratch. |