Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

Authors: Zishen Zhang, Xiangzhe Kong, Wenbing Huang, Yang Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive exeriments show that RADi Ance significantly outperforms baseline models across multiple metrics, including binding affinity and recovery of geometries and interactions. Additional experimental results validate cross-domain generalization, demonstrating that retrieving interfaces from diverse domains, such as peptides, antibodies, and protein fragments, enhances the generation performance of binders for other domains. We evaluate RADi Ance from two key aspects: Retrieval Reliability (Section 4.1) and Generation Performance (Section 4.2), using peptide, antibody, and protein fragments from existing literature.
Researcher Affiliation	Academia	Zishen Zhang1,2 Xiangzhe Kong1,2 Wenbing Huang3 Yang Liu1,2 1Dept. of Comp. Sci. & Tech., Tsinghua University 2Institute for AIR, Tsinghua University 3Gaoling School of Artificial Intelligence, Renmin University of China
Pseudocode	Yes	Algorithm 1 Inference Workflow 1: Input: Binding site structure Y 2: ky Eϕ(Y) {Encode binding site into query key} 3: for each v(j) D do 4: sj ky, v(j) {Compute similarity by inner product} 5: end for 6: I Top K({sj}) {Indices of top-K highest similarity scores} 7: Tv {v(k) : k I} {Retrieve binder latents to form prompt features} 8: ZT N(0, I) {Initialize latent variable from Gaussian} 9: for t = T down to 1 do 10: ϵ ϵθ(Zt x, Zy, Tv, t) {Predict noise} 11: Zt 1 Denoise(Zt, ϵ) {Refine latent variable} 12: end for 13: ˆ X Dξ(Z0) {Decode final latent to molecular graph} 14: Output: Generated binder structure ˆ X
Open Source Code	Yes	The source code of the RADi Ance is available at https://github.com/srhn225/RADi Ance.
Open Datasets	Yes	For peptide design, we adopt Pep Bench [30], which includes 4,157 training and 114 validation complexes, with 93 test cases from the LNR benchmark [49]. For antibodies, we follow the literature [31] to use 9,473 training and 400 validation entries from SAb Dab [14], and 60 test cases from the RAb D benchmark [3]. We further include Prot Frag [30], a dataset of 70,498 monomer-derived protein fragments, to assess the cross-domain referencing ability of our framework.
Dataset Splits	Yes	For peptide design, we adopt Pep Bench [30], which includes 4,157 training and 114 validation complexes, with 93 test cases from the LNR benchmark [49]. For antibodies, we follow the literature [31] to use 9,473 training and 400 validation entries from SAb Dab [14], and 60 test cases from the RAb D benchmark [3].
Hardware Specification	Yes	We trained our model using 8 GPUs with 80 GB memory in parallel.
Software Dependencies	No	The paper discusses using official implementations of various baselines and retraining them on their datasets, but it does not specify concrete software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) in the provided text.
Experiment Setup	Yes	The hyperparameter configurations for both Contrastive VAE and Diffusion models are summarized in Table 8. ... Table 8: Hyperparameters of RADi Ance Name Configuration Description Contrastive VAE Encoder / Decoder Type EPT ... Conditional Latent Diffusion hidden_size 512 Dimension of hidden states T 100 Diffusion steps n_layers 6 Number of denoising layers n_heads 8 Number of heads for multihead self and cross attention n_rbf 64 Number of RBF kernels cutoff 3.0Å Cutoff distance for RBF kernels