Learning symmetries via weight-sharing with doubly stochastic tensors
Authors: Putri van der Linden, Alejandro García-Castellanos, Sharvaree Vadgama, Thijs Kuipers, Erik Bekkers
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results on image benchmarks, demonstrating the effectiveness of our approach in learning relevant weight-sharing schemes when there are clear symmetries. |
| Researcher Affiliation | Academia | 1Amsterdam Machine Learning Lab, University of Amsterdam 2Department of Biomedical Engineering and Physics, Amsterdam UMC, the Netherlands 3Department of Radiology and Nuclear Medicine, Amsterdam UMC, the Netherlands |
| Pseudocode | Yes | S0(X) = exp(X) , Sl(X) = Tc(Tr(Sl 1(X))) , SN S = lim l Sl(X) , (8) with Tc and Tr the normalization operators over the rows and columns, respectively, defined as Tc = X 1N1T NX | {z } sumc(X) and Tr = X X 1N1T N | {z } sumr(X) , where denotes elementwise division, sumc( ), sumr( ) perform column-wise and row-wise summation, respectively. |
| Open Source Code | Yes | Code is available at https://github.com/computri/learnable-weight-sharing. |
| Open Datasets | Yes | Specifically, we evaluate our model on MNIST images that have been rotated (with full SO(2) rotations) and scaled (with scaling factors between [0.3, 1.0]). ... and CIFAR-10 with flips as a dataset with unknown symmetries. |
| Dataset Splits | No | The paper mentions “Test accuracy” in its tables and discussion, but does not explicitly detail the training, validation, and test splits (e.g., percentages or sample counts) in the main text. |
| Hardware Specification | Yes | All the experiments were done on a single GPU with 24GB memory under six hours. |
| Software Dependencies | No | The paper mentions the availability of code but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | Model architecture For all MNIST experiments, a simple 5-block CNN was used. Each block uses a kernel size of 5 and is succeeded by instance norm and Re LU activation, respectively. ... The models used a learning rate of 1e-2 and were trained for 100 epochs. |