On Plasticity, Invariance, and Mutually Frozen Weights in Sequential Task Learning
Authors: Julian Zilly, Alessandro Achille, Andrea Censi, Emilio Frazzoli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore the ideas and connections of mutually frozen weights and invariance and their impact in a number of sequential learning settings across different datasets, network architectures, learning rates, and weight decay settings. Detailed specifications will be provided in the appendix. The following subsections are meant to give further evidence for the following statements: Mutually frozen weights occur and are different from weights that are zero (sparse) but not mutually frozen. Both sufficiently high learning rates and weight decay are essential for the occurrence of mutually frozen weights and final test performance as tested across two different architectures. Mutually frozen weights at the beginning of training can be harmful yet can be removed through a resetting intervention . Across a number of task changes, removing frozen weights is beneficial as long as sufficiently many retraining samples are available. We provide an analysis summary relating frozen weights, invariance, and performance. |
| Researcher Affiliation | Academia | Julian Zilly ETH Zürich jzilly@ethz.ch Alessandro Achille Caltech aachille@caltech.edu Andrea Censi ETH Zürich acensi@ethz.ch Emilio Frazzoli ETH Zürich efrazzoli@ethz.ch |
| Pseudocode | No | The paper describes methods and interventions in text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Yes, we include our code base, to allow others to easily reproduce our results. |
| Open Datasets | Yes | Res Net18 trained on CIFAR-10 images with weight decay λ =1e-3. (Fig 3) and Table 2: Task change Model Reset Test acc. Test acc. Double/Triple FW CIFAR-10 [49] ! CIFAR-100 [49] Res Net18 ... MNIST [50] ! Fashion MNIST [51] Res Net18 ... Image Net!Fashion MNIST [51] Res Net50... |
| Dataset Splits | No | The paper uses standard datasets like CIFAR-10 and ImageNet, which typically have predefined splits. However, it does not explicitly state the specific training, validation, and test split percentages or sample counts within the provided text, nor does it cite a specific work that defines the exact splits used. |
| Hardware Specification | No | The paper states 'We do include the use of resources but do not have a precise estimate of the total amount of hours trained.' in the checklist, but it does not provide specific details about the type of GPUs, CPUs, or other hardware models used for the experiments within the main body of the paper. |
| Software Dependencies | No | The paper mentions using 'code frameworks such as Pytorch' in the checklist but does not specify any version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | We explore the ideas and connections of mutually frozen weights and invariance and their impact in a number of sequential learning settings across different datasets, network architectures, learning rates, and weight decay settings. Detailed specifications will be provided in the appendix. For pretraining on blurred CIFAR-10 images and then switching to regular sharp images, we show the test loss for the Res Net-18 and All-CNN architectures across different initial learning rates and weight decay settings. |