Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
Authors: Alexandru Dumitrescu, Dani Korpela, Markus Heinonen, Yogesh Verma, Valerii Iakovlev, Vikas Garg, Harri Lähdesmäki
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics. Code is available at https://dumitrescu-alexandru.github.io/FMG-web/. We report the basic molecule properties and molecule graph quality metrics in Tables 1 and 2, while molecule conformation and conditional generation results can be found in Appendix A.1 and A.2. In Table 2, we see the same effect mentioned in the QM9 experiments, where the method has a worse TVa metric, likely caused by our method favoring other metrics to correct atom counts. |
| Researcher Affiliation | Collaboration | Department of Computer Science, Aalto University Yai Yai Ltd Correspondence: {alexandru.dumitrescu}@aalto.fi |
| Pseudocode | Yes | Algorithm 1 Peak extraction Input: fields u RNx Ny Nz, threshold t, q {x; u(x) > t}; Am {} repeat p = q.pop() neigh get neigh(p, t, u) Am.insert(mean(neigh, p)) q.pop(neigh) until q is empty return Am |
| Open Source Code | Yes | Code is available at https://dumitrescu-alexandru.github.io/FMG-web/. |
| Open Datasets | Yes | QM9 (Ramakrishnan et al., 2014) is a small molecule dataset, containing 134k molecules... The Geometric Ensemble Of Molecules (GEOM) dataset (Axelrod & G omez-Bombarelli, 2022) contains molecules of up to 181 atoms and 37 million conformations along with their corresponding energies. |
| Dataset Splits | Yes | We use the splits from (Hoogeboom et al., 2022), with 100k, 18k, and 10k molecules for training, validation, and testing respectively. |
| Hardware Specification | Yes | For the QM9 experiments, we use four A100 GPUs (40GB memory version) for 180 hours, and 1.2 million iterations (or about 780 epochs). On the GEOM-Drugs dataset, we used the same four A100 GPUs for 330 hours and trained our explicit H GEOM-Drugs model for 1.32 million iterations (6 epochs), and our implicit one for 1.2 million iterations (5.5 epochs). |
| Software Dependencies | No | The paper mentions 'Adam optimizer' for optimization and 'U-Net architecture' for the denoiser, but it does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8, TensorFlow 2.x). |
| Experiment Setup | Yes | We optimize the Lsimple objective using Adam optimizer with 8 10 5 learning rate and (0.9, 0.99) for the Adam s β parameters. The models are trained using batch sizes of 32 and 64, for GEOM-Drugs and QM9 datasets, respectively. |