Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

P(all-atom) Is Unlocking New Path For Protein Design

Authors: Wei Qu, Jiawei Guan, Rui Ma, Ke Zhai, Weikun Wu, Haobo Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Pallatom excels in key metrics of protein design, including designability, diversity, and novelty, showing significant improvements across the board. Our extensive experiments show that by learning P(all-atom), high-quality all-atom proteins can be successfully generated.
Researcher Affiliation Collaboration 1Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China 2LEVINTHAL Biotechnology Co.Ltd, Hangzhou, China. Correspondence to: Weikun Wu <EMAIL>, Haobo Wang <EMAIL>.
Pseudocode Yes Algorithm 1 Pallatom Inference, Algorithm 2 Main Trunk, Algorithm 3 Template Embedder, Algorithm 4 Atom Feature Encoder, Algorithm 5 Atom Attention Decoder, Algorithm 6 Node Update, Algorithm 7 Pair Update, Algorithm 8 Smooth LDDT loss.
Open Source Code Yes Code Availibility The Pallatom is available on Git Hub (https://github. com/levinthal/Pallatom).
Open Datasets Yes The training dataset of the model includes the PDB (Zardecki et al., 2022) and Alpha Fold Database (AFDB) (Varadi et al., 2021).
Dataset Splits No The paper describes extensive data cleaning and filtering processes applied to PDB and AFDB datasets (Appendix B), resulting in a curated dataset of 27,697 protein structures. However, it does not explicitly provide specific train/validation/test splits (percentages, counts, or predefined splits) for reproducing experiments on this data.
Hardware Specification Yes Training time 10 days Device 4 A6000. All methods were tested on the same hardware: CPU: AMD EPYC 7402 @2.8GHz, GPU: NVIDIA Ge Force RTX 4090 with 24GB VRAM.
Software Dependencies No The paper mentions using the "Adam optimizer" and "JAX's JIT compilation" but does not specify version numbers for these or any other key software libraries or frameworks used in their implementation.
Experiment Setup Yes The model training utilized the Adam optimizer (Kingma & Ba, 2017) with a learning rate of 1e-3, β1 = 0.9, β2 = 0.999, and a batch size of 32. Table 6: Pallatom training hyperparameters provides detailed settings including loss weights, diffusion timesteps, noise schedule parameters, transformer dimensions, and number of decoder units.