MatrixNet: Learning over symmetry groups using learned group representations
Authors: Lucas Laird, Circe Hsu, Asilata Bapat, Robin Walters
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use two learning tasks to evaluate the four variants of Matrix Net and compare our approach against several baseline models. We use several finite groups on a well understood task as an initial test to validate our approach and then move on to an infinite group, the braid group B3, on a task related to open problems. As baselines, we compare to an MLP for fixed maximum sequence length and LSTM and Transformer models on longer sequences. Results of the experiments are summarized in Table 1. |
| Researcher Affiliation | Academia | Lucas Laird Khoury College of Computer Sciences Northeastern University Boston, MA 02115 laird.l@northeastern.edu Circe Hsu Department of Mathematics Northeastern University Boston, MA 02115 hsu.circe@northeastern.edu Asilata Bapat Mathematical Sciences Institute Australian National University Canberra, Australia asilata.bapat@anu.edu.au Robin Walters Khoury College of Computer Sciences Northeastern University Boston, MA 02115 r.walters@northeastern.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/lucas-laird/Matrix Net. |
| Open Datasets | No | We generated a dataset of 500, 000 samples consisting of words of the free group F10, and labels corresponding to their order as elements of S10. An initial dataset of Jordan Hölder multiplicities for braid words up to length 6 was provided. We implemented a state automaton algorithm from [47] to generate additional examples for longer braid words. The paper does not provide a specific link, DOI, or repository name for these generated datasets. |
| Dataset Splits | Yes | The data was split into 60% training data, 20% validation data, and 20% test which were fixed for all models. |
| Hardware Specification | Yes | All of the categorical braid action experiments were run on a machine with a single Nvidia RTX 2080 ti GPU. |
| Software Dependencies | No | Sample order labels in S10 are computed using the SymPy package [50]. The paper does not list specific version numbers for other key software components like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | All of the models trained using an Adam optimizer with a learning rate of 1e 4 and a batch size of 128. The chosen parameters for the models are: Matrix Net: Single channel 14 14 matrix size; Matrix Net-LN: Single channel 10 10 matrix size, 128 dimensions for linear network in the matrix block; Matrix Net-MC: 3-channel 8 8 matrix size; Matrix Net-NL: Single channel 10 10 matrix size, 128 hidden dimensions and a tanh non-linearity between linear layers of matrix block; MLP: 3-layer MLP with 128 hidden dimensions for each layer and Re LU activation functions followed by a single linear layer output; LSTM: 6 LSTM layers with 16 dimensional input embeddings and 32 hidden dimensions followed by a 2-layer MLP classifier with 64 hidden dimensions and Re LU activation; Transformer: 3 transformer layers with 4 attention heads, 16 dimensional embeddings and 32 hidden dimensions. Used mean pooling and a single linear layer output. All of the Matrix Net architectures used a 2-layer MLP with 128 hidden dimensions and Re LU activation to compute output after the matrix block. |