Learning symmetries via weight-sharing with doubly stochastic tensors

Authors: Putri van der Linden, Alejandro García-Castellanos, Sharvaree Vadgama, Thijs Kuipers, Erik Bekkers

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results on image benchmarks, demonstrating the effectiveness of our approach in learning relevant weight-sharing schemes when there are clear symmetries.
Researcher Affiliation Academia 1Amsterdam Machine Learning Lab, University of Amsterdam 2Department of Biomedical Engineering and Physics, Amsterdam UMC, the Netherlands 3Department of Radiology and Nuclear Medicine, Amsterdam UMC, the Netherlands
Pseudocode Yes S0(X) = exp(X) , Sl(X) = Tc(Tr(Sl 1(X))) , SN S = lim l Sl(X) , (8) with Tc and Tr the normalization operators over the rows and columns, respectively, defined as Tc = X 1N1T NX | {z } sumc(X) and Tr = X X 1N1T N | {z } sumr(X) , where denotes elementwise division, sumc( ), sumr( ) perform column-wise and row-wise summation, respectively.
Open Source Code Yes Code is available at https://github.com/computri/learnable-weight-sharing.
Open Datasets Yes Specifically, we evaluate our model on MNIST images that have been rotated (with full SO(2) rotations) and scaled (with scaling factors between [0.3, 1.0]). ... and CIFAR-10 with flips as a dataset with unknown symmetries.
Dataset Splits No The paper mentions “Test accuracy” in its tables and discussion, but does not explicitly detail the training, validation, and test splits (e.g., percentages or sample counts) in the main text.
Hardware Specification Yes All the experiments were done on a single GPU with 24GB memory under six hours.
Software Dependencies No The paper mentions the availability of code but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes Model architecture For all MNIST experiments, a simple 5-block CNN was used. Each block uses a kernel size of 5 and is succeeded by instance norm and Re LU activation, respectively. ... The models used a learning rate of 1e-2 and were trained for 100 epochs.