Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Authors: Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu, Kaize Ding, Han Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed framework, Align Ab, on the task of designing antigen-binding CDRH3 regions. In Section 5.1, we outline the experimental setup for the three training stages. We then introduce the evaluation metrics and discuss the main results in Section 5.2, followed by comprehensive ablation studies in Section 5.3. We report the main results in Table 2. We also include metrics for RMSD and AAR in Table 4 and additional binding and developability metrics in Table 5. We present visualization examples in Figure 4. Overall, Align Ab outperforms baseline methods and narrows the gap between generated and natural antibodies. |
| Researcher Affiliation | Academia | Northwestern University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Iterative Pareto-Optimal Energy Alignment 1: Input: Initial dataset ˆD0 = , KL regularization coefficient β, total online iterations T, batch size m, reference model πref, initial model π0 = πref, and reward model ˆr. 2: for t = 0, 1, 2, , T do 3: Sample input prompts xi X for i = 1, . . . , m. 4: Generate two responses for each prompt: y(1) i , y(2) i πt( | xi). 5: Calculate rewards ˆr(xi, y(1) i ) and ˆr(xi, y(2) i ) for all i [m], and collect them as ˆDt. 6: Optimize πt+1 with ˆD0:t according to (4.5): πt+1 argmin π E(x,yw,yl) ˆ D0:t log σ β log πθ(yw | x) πref(yw | x) β log πθ(yl | x) πref(yl | x) ˆr 7: end for 8: Output: Best-performing policy πt selected from {π0, π1, . . . , πT } using a validation set. |
| Open Source Code | No | Code is not available at the time of submission but will be available as an open-source repository upon acceptance. |
| Open Datasets | Yes | For pre-training, we utilize the antibody sequence data from the Observed Antibody Space database [Olsen et al., 2022]. Following Gao et al. [2023], we adopt the same preprocessing steps including sequence filtering and clustering. Since we focus on CDR-H3 design, we select 50 million heavy chain sequences to pre-train the model. To transfer the knowledge, we use the antibody-antigen data with structural information from SAb Dab database [Dunbar et al., 2014]. |
| Dataset Splits | Yes | During training, we split the clusters into a training set of 2,340 clusters and a validation set of 233 clusters. For testing, we borrow the RAb D benchmark [Adolf-Bryfogle et al., 2017] and select 42 legal complexes not used in training. |
| Hardware Specification | Yes | We perform evaluation for every 1000 training steps and train the model on one NVIDIA Ge Force GTX A100 GPU, and it can converge within 36 hours and 200k steps. |
| Software Dependencies | No | Align Ab consists of two parts: a pre-trained BERT model from Ab GNN [Gao et al., 2023], and a pre-trained diffusion model from Diff Ab [Luo et al., 2022]. For the pre-trained BERT model, our model uses a 12-layer Transformer model with a BERTbase configuration. We set the embedding size to 768 and the number of heads to 12... We utilize the Adam [Kingma and Ba, 2014] optimizer... We also utilize a learning rate scheduler, with factor=0.8, min_lr=5e-6, and patience=10. We perform evaluation for every 1000 training steps and train the model on one NVIDIA Ge Force GTX A100 GPU, and it can converge within 36 hours and 200k steps. The paper mentions various tools and models (BERT, Adam optimizer, Py Rosetta, MMseqs2) but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Transferring. We train the diffusion model part of Align Ab following the same procedure as Luo et al. [2022]. The optimization goal is to minimize the rotation, position, and sequence loss. We apply the same weight to each loss during training. We utilize the Adam [Kingma and Ba, 2014] optimizer with init_learning_rate=1e-4, betas=(0.9,0.999), batch_size=16, and clip_gradient_norm=100. We also utilize a learning rate scheduler, with factor=0.8, min_lr=5e-6, and patience=10... Alignment. After obtaining the diffusion model, we further align it with energy-based preferences provided by domain experts. We utilize the Adam [Kingma and Ba, 2014] optimizer with init_learning_rate=2e-7, betas=(0.9,0.999), batch_size=8, clip_gradient_norm=100. We set the KL regularization term β = 100.0. In each batch, we select 8 pairs of energy-based preference data with labeled rewards. We do not use learning rate scheduling during alignment stage. For rewards, we set the watt and wrep with a fixed ratio 1:3. In each alignment iteration, we fine-tune the diffusion model for 4k steps. |