On The Specialization of Neural Modules

Authors: Devon Jarvis, Richard Klein, Benjamin Rosman, Andrew M Saxe

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we confirm that the theoretical results in our tractable setting generalize to more complex datasets and non-linear architectures.
Researcher Affiliation Academia 1School of Computer Science and Applied Mathematics, University of the Witwatersrand 2Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, UCL 3CIFAR Azrieli Global Scholar, CIFAR {devon.jarvis,richard.klein,benjamin.rosman1}@wits.ac.za a.saxe@ucl.ac.uk
Pseudocode No The paper contains mathematical derivations and equations but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Full code for reproducing all figures can be found at: https://github.com/raillab/specialization_of_neural_modules.
Open Datasets Yes To evaluate how well our results generalize to non-linear networks and more complex datasets, in this section we train a deep Convolutional Neural Network (CNN) to learn a compositional variant of MNIST (CMNIST) shown in Figure 4a.
Dataset Splits No The paper mentions training and testing sets, but does not explicitly describe a validation set or its split. For example, in Section 7, it discusses "normalized training loss (b) and test loss (c)".
Hardware Specification No The paper states, "All experiments are run using the Jax library (Bradbury et al., 2018)," which indicates software used but provides no specific hardware details like GPU/CPU models.
Software Dependencies No The paper mentions "Jax library (Bradbury et al., 2018)" and "Python+Num Py programs" but does not specify version numbers for these software components.
Experiment Setup Yes Table 3: Table showing the hyper-parameters used for the CMNIST experiments. Hyper-parameter Value Step Size 2e-3 Batch Size 16 Initialization Variance 0.01