reproducibilityindex.ai

Learning symmetries via weight-sharing with doubly stochastic tensors

Authors: Putri van der Linden, Alejandro García-Castellanos, Sharvaree Vadgama, Thijs Kuipers, Erik Bekkers

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results on image benchmarks, demonstrating the effectiveness of our approach in learning relevant weight-sharing schemes when there are clear symmetries.
Researcher Affiliation	Academia	1Amsterdam Machine Learning Lab, University of Amsterdam 2Department of Biomedical Engineering and Physics, Amsterdam UMC, the Netherlands 3Department of Radiology and Nuclear Medicine, Amsterdam UMC, the Netherlands
Pseudocode	Yes	S0(X) = exp(X) , Sl(X) = Tc(Tr(Sl 1(X))) , SN S = lim l Sl(X) , (8) with Tc and Tr the normalization operators over the rows and columns, respectively, defined as Tc = X 1N1T NX \| {z } sumc(X) and Tr = X X 1N1T N \| {z } sumr(X) , where denotes elementwise division, sumc( ), sumr( ) perform column-wise and row-wise summation, respectively.
Open Source Code	Yes	Code is available at https://github.com/computri/learnable-weight-sharing.
Open Datasets	Yes	Specifically, we evaluate our model on MNIST images that have been rotated (with full SO(2) rotations) and scaled (with scaling factors between [0.3, 1.0]). ... and CIFAR-10 with flips as a dataset with unknown symmetries.
Dataset Splits	No	The paper mentions “Test accuracy” in its tables and discussion, but does not explicitly detail the training, validation, and test splits (e.g., percentages or sample counts) in the main text.
Hardware Specification	Yes	All the experiments were done on a single GPU with 24GB memory under six hours.
Software Dependencies	No	The paper mentions the availability of code but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup	Yes	Model architecture For all MNIST experiments, a simple 5-block CNN was used. Each block uses a kernel size of 5 and is succeeded by instance norm and Re LU activation, respectively. ... The models used a learning rate of 1e-2 and were trained for 100 epochs.