reproducibilityindex.ai

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Authors: Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.
Researcher Affiliation	Collaboration	Róbert Csordás IDSIA / USI / SUPSI robert@idsia.ch Sjoerd van Steenkiste IDSIA / USI / SUPSI sjoerd@idsia.ch Jürgen Schmidhuber IDSIA / USI / SUPSI / NNAISENSE juergen@idsia.ch
Pseudocode	No	The paper describes its method mathematically but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code for all experiments is available at https://github.com/Robert Csordas/modules.
Open Datasets	Yes	SCAN dataset (Lake & Baroni, 2018), Mathematics Dataset (Saxton et al., 2019), CIFAR10 (Krizhevsky et al., 2009), permuted MNIST benchmark (Kirkpatrick et al., 2017; Golkar et al., 2019; Kolouri et al., 2019)
Dataset Splits	Yes	We randomly choose 10k samples for the new validation set; the rest is used as the new train set.
Hardware Specification	No	The paper mentions 'hardware donations from NVIDIA & IBM' and that experiments 'ﬁt on a single GPU with 16Gb of VRAM (2 GPUs for Poly. collect )', but does not specify exact GPU models (e.g., RTX 3090, A100), CPU models, or other detailed hardware components.
Software Dependencies	No	The paper states 'Our method is implemented in Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Unless otherwise noted we use the Adam optimizer (Kingma & Ba, 2015), a batch size of 128, a learning rate of 10 3, and gradient clipping of 1. The FNN is 5 layers deep, each layer having 2000 units and the LSTM a hidden state size of 256... Mask training uses a learning rate of 10 2 and β = 10 4 for regularization. Table 4: Hyperparameters for different tasks on the Mathematics Dataset