Revisiting Model Stitching to Compare Neural Representations
Authors: Yamini Bansal, Preetum Nakkiran, Boaz Barak
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as good networks learn similar representations , by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. |
| Researcher Affiliation | Academia | Yamini Bansal Harvard University ybansal@g.harvard.edu Preetum Nakkiran Harvard University preetum@cs.harvard.edu Boaz Barak Harvard University b@boazbarak.org |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code for the methodology described. It mentions a third-party framework (VISSL), but not the authors' own implementation code. |
| Open Datasets | Yes | Unless specified otherwise, the CIFAR-10 experiments are conducted on the Res Net-18 architecture (with first layer width 64) and the Image Net experiments are conducted on the Res Net-50 architecture [He et al., 2015]. ...a Vision Transformer [Dosovitskiy et al., 2020] pretrained on CIFAR-5m [Nakkiran et al., 2021]. |
| Dataset Splits | No | While standard datasets (CIFAR-10, ImageNet) are mentioned, the paper does not explicitly provide specific details on dataset split percentages, sample counts for splits, or clear citations to predefined splits for reproducibility. It mentions a "train set" and "test set" but not the methodology for the split or any explicit validation set usage for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions "Satori compute cluster" but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | The stitching layer in the the convolutional networks consist of a 1 1 convolutional layer with input features equal to the number of channels in r, and output features equal to the output channels of A l(x). We add a Batch Norm (BN) layer before and after this convolutional layer. Note that the BN layer does not change the representation capacity of the stitching layer and only aides with optimization. We use the Adam optimizer with the cosine learning rate decay and an initial learning rate of 0.001. |