Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly

Authors: Hongyu Guo, Yoshua Bengio, Shengchao Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation on the benchmarking data CODCluster17 shows that Assemble Flow significantly outperforms six competitive deep learning baselines by at least 45% in assembly matching scores while maintaining 100% molecular integrity. Also, it matches the assembly performance of a widely used domain-specific simulation tool while reducing computational cost by 25-fold. We empirically evaluate Assemble Flow using the benchmarking crystallization dataset CODCluster17. The quantitative results reveal that Assemble Flow significantly outperforms six competitive deep learning baselines by at least 45% in terms of assembly matching score. Also, Assemble Flow exhibits strong assembly performance compared to a widely used domain-specific simulation tool for molecular assembly, achieving this with a 25-fold reduction in computational cost. Furthermore, we present qualitative results, including atomic collision properties of predicted crystals, which further demonstrate Assemble Flow s effectiveness in preserving and modeling the rigidity of the molecular crystallization and assembly process.
Researcher Affiliation	Academia	Hongyu Guo National Research Council Canada University of Ottawa EMAIL Yoshua Bengio Mila Québec AI Institute Université de Montréal CIFAR AI Chair EMAIL Shengchao Liu Université de Montréal EMAIL
Pseudocode	Yes	Note: the pseudo algorithm of our Assemble Flow is provided in Appendix E.3. A high-level overview and pseudo algorithm are provided in Algorithms 1 and 2 in Appendix E.3.
Open Source Code	Yes	The codes and checkpoints are available at this Git Hub repository.
Open Datasets	Yes	We evaluate our method using the crystallization dataset COD-Cluster17 (Liu et al., 2024c). This COD-Cluster17 is a curated subset derived from the Crystallography Open Database (COD) database (Grazulis et al., 2009).
Dataset Splits	No	We evaluate our method using the crystallization dataset COD-Cluster17 (Liu et al., 2024c). This COD-Cluster17 contains 133K crystals and is a curated subset derived from the Crystallography Open Database (COD) database (Grazulis et al., 2009). We consider three versions of COD-Cluster17, with 5k, 10k, and all data, respectively.
Hardware Specification	No	YB acknowledges support from NRC AI4D, CIFAR, and the CIFAR AI Chair program. This project s computational resources are provided by NRC and the Digital Research Alliance of Canada.
Software Dependencies	No	For each molecule in the cluster, we adopt the SE(3)-equivariant Pai NN (Schütt et al., 2021) to obtain the representation for each atom. ... The outputs include a molecular level predicted rotation velocity ˆqθ RM 3 and predicted translation velocity ˆxθ RM 3, where M is the number of molecules in the cluster. ... Optimization seed {0, 42, 123} epochs {1000, 2000} cutoff c {20, 50} learning rate {1e-4, 5e-4} optimizer {Adam }
Experiment Setup	Yes	We provide the key hyper-parameters of Assemble Flow in Table 6. Table 6: Hyperparameter specifications for Assemble Flow. Model Hyperparameter Value Intra-modeling Pai NN embedding dim {128} num of layers {3} cutoff {5} read out {mean} Intra-modeling Atomic Level num of layers {2,5} num of convolution {2} num of head {4, 8} num of timesteps {50, 200} α0 {1} α1 {1, 10} Intra-modeling Molecular Level num of layers {4,5} num of head {4, 8} num of timesteps {50, 200} α0 {1} α1 {1, 10} Optimization seed {0, 42, 123} epochs {1000, 2000} cutoff c {20, 50} learning rate {1e-4, 5e-4} optimizer {Adam }