Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Antibody Humanness Prediction using Patent Data
Authors: Talip Ucar, Aubin Ramon, Dino Oglic, Rebecca Croasdale-Wood, Tom Diethe, Pietro Sormanni
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that the learned model consistently outperforms the alternative baselines and establishes new state-of-the-art on five out of six inference tasks, irrespective of the used metric. |
| Researcher Affiliation | Collaboration | 1Centre for AI, Bio Pharmaceuticals R&D, Astra Zeneca 2Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge 3Biologics Engineering, Oncology R&D, Astra Zeneca. |
| Pseudocode | No | The paper describes the Self PAD framework and its training processes using explanatory text and diagrams (Figure 1), but it does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | The code for Self PAD is available at: https://github.com/AstraZeneca/SelfPAD |
| Open Datasets | Yes | We use patented antibody database (PAD) (Krawczyk et al., 2021)... 553 Therapeutics is a dataset from Prihoda et al. (2022)... 217 immunogenicity refers to the dataset obtained from Prihoda et al. (2022)... 25 humanization data refers to the dataset... Marks et al. (2021). |
| Dataset Splits | Yes | The training set is then split into two folds, 90% for training and 10% for validation, and the model is fine-tuned with cross-entropy loss for 25 epochs. |
| Hardware Specification | Yes | We used a compute cluster consisting of A10G GPUs throughout this work. |
| Software Dependencies | Yes | We implemented our work using Py Torch (Paszke et al., 2019). |
| Experiment Setup | Yes | We pre-trained the model with a batch size of 100 for 1000 epochs (see Figure 5 in the appendix)... we used cross-entropy loss with label smoothing (configured to 0.5) and a batch size of 512 for 25 epochs... Table 9 lists hyperparameters used for pre-training and fine-tuning stages. |