Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN

Authors: Talal Widatalla, Richard W. Shuai, Brian Hie, Possu Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements. ... To benchmark FAMPNN s sequence recovery compared to other methods, we report median sequence recovery on the CATH 4.2 test set in Table 1.
Researcher Affiliation	Collaboration	1Department of Biophysics, Stanford University, Stanford, CA 2Arc Insitute, Palo Alto, CA 3Department of Chemical Engineering, Stanford University, Stanford, CA 4Stanford Data Science, Stanford University, Stanford, CA 5Department of Bioengineering, Stanford University, Stanford, CA. Correspondence to: Po-Ssu Huang <EMAIL>, Brian L. Hie <EMAIL>.
Pseudocode	Yes	Algorithm 1 Multimer Contiguous Crop ... Algorithm 2 Full-atom Encoder ... Algorithm 3 Rigid from 3 points using the Gram-Schmidt process ... Algorithm 4 Sidechain diffusion MLP ... Algorithm 5 Conditioned MLP Block ... Algorithm 6 Final Layer ... Algorithm 7 Sidechain Confidence Prediction
Open Source Code	Yes	Code for FAMPNN is available at https://github. com/richardshuai/fampnn.
Open Datasets	Yes	We report median sequence recovery on the CATH 4.2 test set in Table 1. ... we evaluate on RFdiffusion-generated de novo backbones from lengths 100 to 500 (Ye et al., 2024a). ... We compare our sidechain packing performance to other methods on CASP13, 14, and 15 targets (Table 2). ... We evaluate FAMPNN (0.3 A) on three stability datasets, five antibody-antigen binding affinity datasets, and two versions of a general protein-protein binding affinity dataset ... To evaluate zero-shot performance for prediction of proteinprotein binding affinity, we evaluate on SKEMPIv2 ... To evaluate zero-shot prediction of protein stability, we evaluate on S669, Megascale, and Fire Prot DB
Dataset Splits	Yes	We utilized the CATH 4.2 (Knudsen & Wiuf, 2010) S40 dataset which is a curation of domains extracted from the PDB with redundant domains (those with >40% homology) removed, with training, validation and test splits identical to Ingraham et al. (Ingraham et al., 2019). ... For each length in {100, 200, 300, 400, 500}, we generated 100 samples from RFdiffusion using the default parameters. ... The CASP13 and CASP14 test sets were obtained directly from the official Attn Packer Git Hub repository (Mc Partlon & Xu, 2023), and the CASP15 targets were downloaded from the CASP data archive. ... We additionally evaluate on a test subset of SKEMPIv2 recently curated in Bushuiev et al to address significant data leakage issues in supervised models trained on previously proposed data splits (Bushuiev et al., 2024).
Hardware Specification	Yes	The CATH trained FAMPNN models were trained on a single NVIDIA H100 GPU with 80GB of RAM... The PDB trained models were trained on 4 NVIDIA H100 GPUs with 80GB of RAM per GPU
Software Dependencies	No	The paper mentions using Python implicitly for implementation, and refers to various models and tools like Alpha Fold2, MMseqs2, Protein MPNN, ESM-IF1, etc., by citing their respective papers or repositories. However, it does not provide specific version numbers for any of the general software dependencies (e.g., Python, PyTorch, CUDA) or the mentioned tools (e.g., MMseqs2).
Experiment Setup	Yes	The CATH trained FAMPNN models were trained on a single NVIDIA H100 GPU with 80GB of RAM, with a batch size of 64 and fixed example size of 256 residues. The models were trained until 100k steps... The PDB trained models were trained on 4 NVIDIA H100 GPUs with 80GB of RAM per GPU, with a batch size of 8 per GPU and fixed example size of 1024 residues. To increase the effective batch size, 4 gradient accumulation steps were taken for each backpropagation step, bringing the effective batch size to 128 examples. The models were trained until 300k steps... We add independent noise to each x, y, and z coordinate by sampling from a normal distribution scaled by a factor of 0.3. Specifically, for each coordinate, the perturbation is drawn from N(0, σ2), with standard deviation σ = 0.3. ... For sampling, we run a full trajectory using 50 steps of diffusion with a step scale η = 1.5. Parameter Value: σmin 0.01 σmax 80 σdata 0.66 ρ 7 Pmean 1.5 Pstd 1.0