Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport

Authors: Haokai Hong, Wanyu LIN, KC Tan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The evaluation results are presented in Tables 1 and 2 with Figure 3.
Researcher Affiliation Academia Haokai Hong, Wanyu Lin , Kay Chen Tan Department of Data Science and Artificial Intelligence Department of Computing The Hong Kong Polytechnic University, Hong Kong SAR, China. EMAIL, EMAIL
Pseudocode Yes A comprehensive proof is given in Appendix B, and the pseudocode for training and sampling is presented in Appendix C. C ALGORITHMS This section contains the main algorithms of the proposed GOAT. First, we present the algorithm for solving optimal molecule transport and unified flow in Algorithm 1 and Algorithm 2, respectively. Algorithm 3 presents the pseudo-code for training the GOAT. Algorithm 4 presents the process of fast molecule generation with GOAT.
Open Source Code Yes Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. The code is available at github.
Open Datasets Yes Datasets. We evaluate over benchmark datasets for 3D molecule generation, including QM9 (Ramakrishnan et al., 2014) and the GEOM-DRUG (Axelrod & G omez-Bombarelli, 2022). QM9 is a standard dataset that contains 130k 3D molecules with up to 29 atoms. GEOM-DRUG encompasses around 450K molecules, each with an average of 44 atoms and a maximum of 181 atoms. More dataset details are presented in Appendix E.
Dataset Splits Yes E.1 QM9 DATASET QM9 (Ramakrishnan et al., 2014) is a comprehensive dataset that provides geometric, energetic, electronic, and thermodynamic properties for a subset of the GDB-17 database (Ruddigkeit et al., 2012) comprises a total of 130,831 molecules 3. We utilize the train/validation/test partitions delineated in (Anderson et al., 2019), comprising 100K, 18K, and 13K samples for each respective partition.
Hardware Specification Yes Hardware Configuration 1. GPU: NVIDIA Ge Force RTX 3090 2. CPU: Intel(R) Xeon(R) Platinum 8338C CPU 3. Memory: 512 GB
Software Dependencies No The paper mentions software like Rd Kit (Landrum et al., 2016) and the Adam optimizer (Kingma & Ba, 2015), but it does not provide specific version numbers for these or other key software components (e.g., programming languages, deep learning frameworks).
Experiment Setup Yes In this study, all the neural networks utilized for the encoder, flow network, and decoder are implemented using EGNNs (Satorras et al., 2021). The dimension of latent invariant features, denoted as k, is set to 2 for QM9 and 1 for GEOM-DRUG, to map the molecule for a unified flow matching. For the training of the flow neural network, we employ EGNNs with 9 layers and 256 hidden features on QM9, and 4 layers and 256 hidden features on GEOM-DRUG, with a batch size of 64 and 16, respectively. All models utilize Si LU activations and are trained until convergence. Across all experiments, the Adam optimizer (Kingma & Ba, 2015) with a constant learning rate of 10 4 is chosen as our default training configuration. The training process for QM9 takes approximately 3000 epochs, while for GEOM-DRUG, it takes about 20 epochs.