E$(n)$ Equivariant Message Passing Simplicial Networks
Authors: Floor Eijkelboom, Rob Hesselink, Erik J Bekkers
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents E(n) Equivariant Message Passing Simplicial Networks (EMPSNs), a novel approach to learning on geometric graphs and point clouds that is equivariant to rotations, translations, and reflections. [...] The results indicate that EMPSNs can leverage the benefits of both approaches, leading to a general increase in performance when compared to either method individually, being on par with state-of-the-art approaches for learning on geometric graphs. [...] We experimentally show that the use of higherdimensional simplex learning improves performance compared to EGNNs and MPSNs, without requiring more parameters. This improvement is also found in datasets with few higher-dimensional simplices. We finally show that EMPSNs are competitive with stateof-the-art approaches on graphs, as illustrated in the N-body experiment and QM9. We also show that this improvement is obtained without a much greater forward time than other existing approaches. |
| Researcher Affiliation | Academia | 1University of Amsterdam. Correspondence to: Floor Eijkelboom <eijkelboomfloor@gmail.com>. |
| Pseudocode | Yes | In Appendix B. Implementation: The features are embedded using linear embedding: Initial Feature {Linear Layer} Embedded Feature. For each adjacency, we learn a message function, e.g. for the adjacency A from τ to σ: [hσ, hτ, Inv(σ, τ)] {hσ hτ Inv(σ, τ) Linear Layer Swish Linear Layer Swish} m A σ,τ, where denotes concatenation. For each message, an edge importance is computed: m A σ {Linear Layer Sigmoid} e A σ. For each simplex type, we learn an update for the simplex based on the different adjacency updates, i.e. [hσ, {m A σ }, {e A σ }] {hσ (L A e A σ m A σ ) Linear Layer Swish Linear Layer Addition(hσ)} h σ. A final readout: {hn i } {Linear Layer Swish Linear Layer L i hn i ) Linear Layer Swish Linear Layer} Prediction. These learnable functions are the same across all experiments. In all experiments, we included boundary, co-boundary, and upper adjacent communication. Moreover, if the initial graph has velocities, we update the position using two MLPs similar to done in Satorras et al. (2021): A new velocity is computed based using two MLPs vi = ϕv(hi)vinit i + C P j =i(xi xj)ϕx(mij). The position is updated using the velocity, x i = xi + vi. Both ϕv and ϕx are two-layer MLPs with a Swish activation function, i.e. Input {Linear Layer Swish Linear Layer} Output. |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described. |
| Open Datasets | Yes | QM9 The QM9 dataset (Ramakrishnan et al., 2014) is a molecular dataset consisting of small molecules containing at most 29 atoms embedded in 3-dimensional space. [...] N-body system As introduced in Kipf et al. (2018), the N-body system experiment considers the trajectory in 3dimensional space of 5 charged particles over time. |
| Dataset Splits | Yes | For the data, we used the common split of 100K molecules for training, 10K molecules for testing and the rest for validation. [...] For the data, we used the same setup as used in Satorras et al. (2021), i.e. we used 3, 000 training trajectories, 2, 000 validation trajectories, and 2, 000 test trajectories, where each trajectory contains 1, 000 time steps. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam (Kingma & Ba, 2014) and Cosine Annealing (Loshchilov & Hutter, 2016), and mentions Pytorch Geometric and Gudhi implementations in Appendix A, but it does not specify exact version numbers for any software components. |
| Experiment Setup | Yes | The models are optimized using Adam (Kingma & Ba, 2014) with an initial learning rate of η = 5 10 4 and a Cosine Annealing learning rate scheduler (Loshchilov & Hutter, 2016). The loss used for optimization is the Mean Absolute Error. All predicted properties have been normalized by first subtracting the mean of the target in the training set and then dividing by the mean absolute deviation in the training set to stabilize training. We used a batch size of 128 molecules per batch and a weight decay of 10 16. Last, we endowed the message and update functions with batch normalization. [...] The optimization is done using Adam, with a constant learning rate of η = 5 10 4, a batch size of 100, and weight decay of 10 12. Moreover, the invariant features are embedded using Gaussian Fourier features as introduced in Tancik et al. (2020). The loss minimized is the MSE in the predicted position. |