Revisiting Model Stitching to Compare Neural Representations

Authors: Yamini Bansal, Preetum Nakkiran, Boaz Barak

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as good networks learn similar representations , by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance.
Researcher Affiliation Academia Yamini Bansal Harvard University ybansal@g.harvard.edu Preetum Nakkiran Harvard University preetum@cs.harvard.edu Boaz Barak Harvard University b@boazbarak.org
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code for the methodology described. It mentions a third-party framework (VISSL), but not the authors' own implementation code.
Open Datasets Yes Unless specified otherwise, the CIFAR-10 experiments are conducted on the Res Net-18 architecture (with first layer width 64) and the Image Net experiments are conducted on the Res Net-50 architecture [He et al., 2015]. ...a Vision Transformer [Dosovitskiy et al., 2020] pretrained on CIFAR-5m [Nakkiran et al., 2021].
Dataset Splits No While standard datasets (CIFAR-10, ImageNet) are mentioned, the paper does not explicitly provide specific details on dataset split percentages, sample counts for splits, or clear citations to predefined splits for reproducibility. It mentions a "train set" and "test set" but not the methodology for the split or any explicit validation set usage for hyperparameter tuning.
Hardware Specification No The paper mentions "Satori compute cluster" but does not provide specific hardware details such as GPU or CPU models, or memory specifications used for the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes The stitching layer in the the convolutional networks consist of a 1 1 convolutional layer with input features equal to the number of channels in r, and output features equal to the output channels of A l(x). We add a Batch Norm (BN) layer before and after this convolutional layer. Note that the BN layer does not change the representation capacity of the stitching layer and only aides with optimization. We use the Adam optimizer with the cosine learning rate decay and an initial learning rate of 0.001.