Break It Down: Evidence for Structural Compositionality in Neural Networks
Authors: Michael Lepori, Thomas Serre, Ellie Pavlick
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we leverage model pruning techniques to investigate this question in both vision and language across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models often implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subnetworks. |
| Researcher Affiliation | Academia | Michael A. Lepori1 Thomas Serre2 Ellie Pavlick1 1Department of Computer Science 2Carney Institute for Brain Science Brown University |
| Pseudocode | No | The paper describes the methodology with equations and steps (e.g., in Appendix B), but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present the method steps in a code-like format. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/mlepori1/Compositional_Subnetworks. |
| Open Datasets | Yes | Tasks: We extend the collection of datasets introduced in Zerroug et al. (2022), generating several tightly controlled datasets that implement compositions of the following subroutines: contact, inside, and number. ... Tasks: We use a subset of the data introduced in Marvin & Linzen (2019) to construct odd-one-out tasks for language data. |
| Dataset Splits | Yes | Subject-Verb Agreement: Compositional Dataset: 9500 (Train), 500 (Validation), 1000 (Test) ... Reflexive Anaphora: Compositional Dataset: 2500 (Train), 200 (Validation), 200 (Test) |
| Hardware Specification | Yes | We used NVIDIA Ge Force RTX 3090 GPUs for all experiments. |
| Software Dependencies | No | The paper mentions models (e.g., Resnet50, BERT-Small) and optimizers (Adam), and that Sim CLR pretraining was adapted from an implementation (Lippe, 2022), but it does not specify software library version numbers (e.g., PyTorch, TensorFlow, or specific library versions used for these implementations). |
| Experiment Setup | Yes | We perform a hyperparameter search over batch size and learning rate... All models are trained using the Adam optimizer (Kingma & Ba, 2014) with early stopping for a maximum of 100 epochs (patience set to 75 epochs)... During mask training, we use L0 regularization... Following Savarese et al. (2020), we fix βmax = 200, λ = 10 8, and train for 90 epochs. We train the mask parameters using the Adam optimizer with a batch size of 64 and search over learning rates. |