Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

E(3)-equivariant models cannot learn chirality: Field-based molecular generation

Authors: Alexandru Dumitrescu, Dani Korpela, Markus Heinonen, Yogesh Verma, Valerii Iakovlev, Vikas Garg, Harri Lähdesmäki

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics. Code is available at https://dumitrescu-alexandru.github.io/FMG-web/. We report the basic molecule properties and molecule graph quality metrics in Tables 1 and 2, while molecule conformation and conditional generation results can be found in Appendix A.1 and A.2. In Table 2, we see the same effect mentioned in the QM9 experiments, where the method has a worse TVa metric, likely caused by our method favoring other metrics to correct atom counts.
Researcher Affiliation Collaboration Department of Computer Science, Aalto University Yai Yai Ltd Correspondence: {alexandru.dumitrescu}@aalto.fi
Pseudocode Yes Algorithm 1 Peak extraction Input: fields u RNx Ny Nz, threshold t, q {x; u(x) > t}; Am {} repeat p = q.pop() neigh get neigh(p, t, u) Am.insert(mean(neigh, p)) q.pop(neigh) until q is empty return Am
Open Source Code Yes Code is available at https://dumitrescu-alexandru.github.io/FMG-web/.
Open Datasets Yes QM9 (Ramakrishnan et al., 2014) is a small molecule dataset, containing 134k molecules... The Geometric Ensemble Of Molecules (GEOM) dataset (Axelrod & G omez-Bombarelli, 2022) contains molecules of up to 181 atoms and 37 million conformations along with their corresponding energies.
Dataset Splits Yes We use the splits from (Hoogeboom et al., 2022), with 100k, 18k, and 10k molecules for training, validation, and testing respectively.
Hardware Specification Yes For the QM9 experiments, we use four A100 GPUs (40GB memory version) for 180 hours, and 1.2 million iterations (or about 780 epochs). On the GEOM-Drugs dataset, we used the same four A100 GPUs for 330 hours and trained our explicit H GEOM-Drugs model for 1.32 million iterations (6 epochs), and our implicit one for 1.2 million iterations (5.5 epochs).
Software Dependencies No The paper mentions 'Adam optimizer' for optimization and 'U-Net architecture' for the denoiser, but it does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8, TensorFlow 2.x).
Experiment Setup Yes We optimize the Lsimple objective using Adam optimizer with 8 10 5 learning rate and (0.9, 0.99) for the Adam s β parameters. The models are trained using batch sizes of 32 and 64, for GEOM-Drugs and QM9 datasets, respectively.