Break It Down: Evidence for Structural Compositionality in Neural Networks

Authors: Michael Lepori, Thomas Serre, Ellie Pavlick

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we leverage model pruning techniques to investigate this question in both vision and language across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models often implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subnetworks.
Researcher Affiliation Academia Michael A. Lepori1 Thomas Serre2 Ellie Pavlick1 1Department of Computer Science 2Carney Institute for Brain Science Brown University
Pseudocode No The paper describes the methodology with equations and steps (e.g., in Appendix B), but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present the method steps in a code-like format.
Open Source Code Yes Our code is publicly available at https://github.com/mlepori1/Compositional_Subnetworks.
Open Datasets Yes Tasks: We extend the collection of datasets introduced in Zerroug et al. (2022), generating several tightly controlled datasets that implement compositions of the following subroutines: contact, inside, and number. ... Tasks: We use a subset of the data introduced in Marvin & Linzen (2019) to construct odd-one-out tasks for language data.
Dataset Splits Yes Subject-Verb Agreement: Compositional Dataset: 9500 (Train), 500 (Validation), 1000 (Test) ... Reflexive Anaphora: Compositional Dataset: 2500 (Train), 200 (Validation), 200 (Test)
Hardware Specification Yes We used NVIDIA Ge Force RTX 3090 GPUs for all experiments.
Software Dependencies No The paper mentions models (e.g., Resnet50, BERT-Small) and optimizers (Adam), and that Sim CLR pretraining was adapted from an implementation (Lippe, 2022), but it does not specify software library version numbers (e.g., PyTorch, TensorFlow, or specific library versions used for these implementations).
Experiment Setup Yes We perform a hyperparameter search over batch size and learning rate... All models are trained using the Adam optimizer (Kingma & Ba, 2014) with early stopping for a maximum of 100 epochs (patience set to 75 epochs)... During mask training, we use L0 regularization... Following Savarese et al. (2020), we fix βmax = 200, λ = 10 8, and train for 90 epochs. We train the mask parameters using the Adam optimizer with a batch size of 64 and search over learning rates.